Once upon a time, movies were shown in theaters with no sound whatsoever. Audiences soon got bored
of this and so theaters employed pianists and other musicians to play musical scores while movies
played. Many musicians would improvise and try to mimic the tempo of the onscreen action to try to
enhance the emotional involvement of the audience.
It was not long, upon realizing the power of sound, that movies were using the latest sound
technologies to incorporate voices and other sound effects into the experience that their films offered.
Since that time, the movie soundtrack has become a sophisticated art that massively enhances the
onscreen action. Some films, such as David Jarman’s Blue (1993) even went so far as to rely solely on
sound as the driver of the action.
Today’s movies are more reliant than ever on technology to produce ever more realistic, and ever more
entertaining film soundtracks to keep audiences on the edge of their seats. Naturally, given the ever
more sophisticated sound engineering technologies that are required, movie sound is a big business,
and an expensive one at that.
One way film production companies are trying to improve their soundtracks while fighting ever-rising
costs is through automation. Typically, this has been seen on the post-production side of things. Film
companies realized early on that they could dub sounds onto soundtracks in post-production.
Professionals that came to be known as “ Foley artists ” would record and then overdub whatever sounds
needed to be added to the soundtrack. This was painstaking work and took a great deal of time and
effort. As more and more sounds were accumulated in the form of ‘stock sounds’, their job was made a
little easier. However, the vast majority of sounds still needed to be recorded from scratch.
Well, thanks to a new field of AI development, movie industry Foley artists are about to get a much-
needed helping hand from artificial intelligence. Let’s take a look how.

AutoFoley to the Rescue

Ever since a University of Texas at San Antonio research team developed a technique that showed that
AI could help to automate the Foley process, AI-assisted filmmaking companies have been racing to
develop a commercial solution to this problem. Work is still ongoing, however, we are not far away
from seeing this exciting technology being made available to movie production companies and sound
engineers across the globe.
But how exactly does this AI-driven AutoFoley technology work? And how will it assist Foley artists
and film sound engineers?
In essence, what this new AI sound technology does is utilize a neural network to analyze motion and
objects in video clips. It is then able to utilize another neural network, known as a “deep sound
synthesis network,” to synthesize sounds to match what is going on in the video clip.
Going into a little more detail, the first neural network analyzes “the association of movement and
timing in video-frame images by extracting features like color, using a multiscale recurrent neural
network (RNN) combined with a convolutional neural network (CNN).” Source: “These AI-

Synthesized Sound Effects Are Realistic Enough to Fool Humans.” https://thenewstack.io/these-ai-
synthesized-sound-effects-are-realistic-enough-to-fool-humans/

For more fast-paced, non-linear edit clips, where certain information might be missing, an interpolation
technique using CNNs and a temporal relational network (TRN) is used to allow it to piece together
any missing information so that the system can accurately assess what sounds are missing.
The reason that the system is focused on analyzing motion is so that it can understand the relationship
of each sound to both the unfolding events onscreen as well as to the pace of action and the edit
between shots. This is vital for movies where much of the importance of the soundtrack is to aid both
the onscreen action as well as the relationship between shots.
For the final act, AutoFoley systems then rely on a second neural network that matches sounds from a
database, or even creates entirely new sounds, to fit the required sounds missing from the soundtrack.
To see just how effective this technique is, watch this footage of a horse galloping.
https://www.youtube.com/watch?time_continue=4&v=uTSff5p-v1M&feature=emb_logo

Helping Foley Engineers to be More Productive

In order to see how humans would view the results. The research team showed the finalized videos
with their AI-generated sound effects to 57 volunteers. A staggering 73% indicated that they thought
the synthesized AutoFoley sounds were actually original sounds that had been recorded at the same
time as the video. Continued training of these AutoFoley AI systems will only serve to boost that
percentage.
Once an AI platform makes this technology commercially available to sound engineers, it will be a
powerful tool that will become indispensable in their day-to-day work.
This tool’s primary role will be to analyze film clips and to suggest sounds. In a matter of a very short
time, the system will be able to create numerous different soundtracks for any given clip based on
parameters set by the Foley engineers. Examples include horror, drama, comedy, etc.
The engineer will then be freed to dedicate more time to use their experience to define which approach
and which sounds best fit the clip overall. Since they will have to do less of the leg work, they will be
freed to take more time to appraise each sound and its suitability.
Also, they will be able to present the director and production crew with multiple different soundtrack
versions in the time it would normally take to complete just one. This will most certainly have the
effect of enriching the overall results and quality of the finished film soundtrack.