The ability of humans to localize and understand multiple sound sources as well as different types of sounds is not only a vital part of most part of our day-to-day lives but also makes our movie-going experience so much richer.

Within a matter of milliseconds, we humans are able to process a huge range of sounds that could be coming from anywhere. “Humans can detect sounds in a frequency range from about 20 Hz to 20 kHz.” Source – NCBI

https://www.ncbi.nlm.nih.gov/books/NBK10924/

While this range is not as good as other mammals like cats and dogs, for example, this hearing range allows us to process all kinds of different sounds from the world around us, from thunder claps to a violin. Indeed, so amazing is our ability to process all these sounds that we can pick different instruments out of a large orchestra.

One of the big innovations that movie theaters have benefited from is the increasing complexity of the audio experience that they are able to offer customers. From a single piano in the days of silent film, audio systems have grown so complex that many are able to give an accurate 360 degrees of sound experience to every audience member, no matter where they are seated in the theater.

In recent years, this experience has even become available to home viewers thanks to the rise of relatively low-cost surround sound systems and huge advances in speaker technology.

However, these advances are just the beginning. Recent breakthroughs in artificial technology systems promise to revolutionize the way movies use sound both in terms of movie analytics as well as the final soundtrack.

Let’s take a deeper look.

A recent article that appeared in MIT News announced that MIT neuroscientists have developed “an AI model that can actually localize sounds in the real world.” This groundbreaking ML system uses convolutional neural networks in order to allow the system to localize sounds in any given real-world environment.

https://news.mit.edu/2022/where-sound-come-from-model-0127

The real breakthrough with this system is that rather than simply being able to interpret sounds in the manner of a chat-bot, for example, where the sounds need to be uneffected by exterior elements such as other sounds or echos, etc., this new system is able to recognize and localize individual sounds in a whole manner of different environments where heavy noise distortion or noise pollution is a real problem.

“Convolutional neural networks can be designed with many different architectures, so to help them find the ones that would work best for localization, the MIT team used a supercomputer that allowed them to train and test about 1,500 different models. That search identified 10 that seemed the best-suited for localization, which the researchers further trained and used for all of their subsequent studies.” – MIT News.

The team then trained the AI system in a controlled environment where various background noises and echo patterns were used to increase the system’s ability to localize sounds and distinguish from echos, etc.

It was trained with more than 400 sounds including human voices, natural sounds such as thunder, and machine sounds like car engines, etc. Finally, the system’s microphones were placed inside dummy human ears in order to allow it to benefit from the same physical features that funnel sounds into both our inner ears to enable sound localization.

The researchers found that the system mirrored the performance of us, humans, very closely. So, for example, when the number of sounds or the level of background interference was increased, the system found it harder and harder to distinguish sounds, exactly as we humans do.

The development of this exciting new application of artificial intelligence and machine learning system will have a number of very useful applications to the movie production process and to each of our movie experiences.

On the AI-assisted moviemaking side of things, the main advantage of this technology will be the increased accuracy of existing tools coupled with the ability of these AI platforms to create a range of new and powerful tools with which to empower filmmakers with more powerful insights.

An obvious example will be the increased accuracy of video analytic tools which allow the system to analyze film clips in order to provide insights such as feedback on acting performance, suitability of a sound on the soundtrack, identifying genre elements, etc. Naturally, background noises and sound interference hinder the accuracy of current AI systems.

Along with this boost in accuracy, AI-assisted technology platforms will be able to create a range of tools that use the audio soundtrack as a basis for their in-depth analytics. Examples might include help with level mixing for specific sound requirements, i.e 2:1, 5:1, 6:1, and 7:1 surround sound, etc.

Another offshoot to this, AI systems will be able to help suggest sound additions and modifications that the sound editor could make to enhance the effect of the audio in a scene. This might be the addition of more sounds that aim to confuse or disorientate an audience during a horror scene, etc.

These and other tools would ultimately help automate some of the more complex elements of the effect of the soundtrack on audiences watching the film in different environments.

Such a system would eventually have the power to help filmmakers create tailor-made soundtracks for a wide range of different environments, or perhaps even eventually allow movie theater or home sound systems to automatically adjust their sound settings in order to create the optimum audio experience for the environment that the movie is being played in.

While it is very early days yet, this exciting breakthrough promises to unlock yet more of the enormous power of AI to help create a more healthy and exciting movie business in the future.