The 3D mix

aluminum audio close up design

Photo by Pixabay on


I encourage you to aim for what I call a three dimension mix: that is horizontal, vertical, and depth.

Let’s take them one at a time.


The Horizontal

This is controlled by Pan. Whether you’re working in stereo or surround the principle’s the same. In a surround mix most of your sound is still going to be at the front. Just more speakers to work with. Each track in your video or audio NLE will have a pan control, and you can place the sound anywhere within that stereo or surround field you want. As mentioned, dialogue is always placed in the middle. A lot of Foley and SFX will also be placed in the middle, although a  surround mix will use the left and right speakers to create a phantom centre, rather than the centre channel. This keeps it isolated from the dialogue.

Moving a sound across the speakers to follow the action is easy, but usually not recommended. It draws attention to itself. Of course there are exceptions. In Gravity they had the voices of the astronauts at the beginning of the film coming from left and right sides. In that case it gave you a real sense of the ‘space’ they were in and their distance from one another.


The Vertical

This is controlled using volume level and EQ. When I say vertical I don’t literally mean the up and down, but how the sounds within the mix are layered. You will often have more than one sound at a time. Dialogue, Atmos, Foley, Sound Effects and Music might all be present in a scene. But if you have them all happening at once it becomes a mess. There’s too much going on for our ears and our brains to know what to focus on.

As in cinematography where you use composition, focus and light to guide your audience’s eye to what’s important, we do the same with sound, guiding their ear by how we mix the sounds. Especially when there’s a lot going on.

Walter Murch uses the example of the helicopter attack sequence in Apocalypse Now, a film he sound designed and co-edited. There’s a lot going on in that scene. He came up with the principle of no more than 2½ sounds at a time. That’s all our brains can focus on. Beyond that, individual sounds are instead interpreted as a composite sound image.

Music is the best example of this. A group of sounds combined to create a single sound image. This means, if there are more than three sounds at once the audience doesn’t know what to listen to. It just becomes noise (an unpleasant composite sound). You therefore must choose which two or three sounds you want people to focus on. If there’s dialogue, then that gets preference. If there’s a very obvious sound effect or Foley noise we are seeing, then we should hear it. That leaves half a sound for something we won’t really notice as much, like atmos or music (both composite sounds). That’s moment to moment. In the next moment what those 2 ½ sounds are will invariably change.

Murch claims he always strived to have as few sounds as possible. For him the ideal soundtrack was one that had no sound whatsoever. It all happened in the viewer’s brain.

You’d be amazed how often an audience can hear things that aren’t even there, if you guide them well.

One or two key sounds will really help your audience know what to focus on. Finding those that best serve the story is what good sound design is all about. It also keeps the mix under control, avoids it becoming a cacophonous mess. Do it right and your audience won’t even notice what’s missing. As long as the important sounds for that moment are there.

Murch’s law of two-and-a-half ensures the vertical axis of our mix is lean and focused. You balance the sounds using their respective volume levels, but there is still the chance they may clash sonically. This happens if the important part of the sounds is in the same frequency range. This is where EQ comes in.

For example: The human voice is in the mid-range. In fact most of the sounds we hear in everyday life are in the mid-range. That’s what our ears are most attuned to. If you have dialogue occurring at the same time as someone is revving their car in a scene, it’s gonna be hard to hear the dialogue. You need the sound effect of the car, but we also need to understand what they’re saying.

There’s two ways to make the dialogue clearer. The first is to turn down the volume level on the car – but then it sounds unrealistic. The other way is to apply some EQ to the noisy car, pulling it down at around 1kHz. This’ll leave a hole in the mid-range for the voice to poke through. The car will still sound loud, all that bottom end will still be there, and though the character of the sound will change slightly no-one but hard-core motor geeks will notice.

Alternatively you could tweak the voice up at around 1-2kHz. This is a good principle if you ever have a recording that seems a bit dull and muddy. It will help emphasize the consonants in the dialogue, by boosting the higher frequencies slightly. If we can hear the consonants clearly, we’re better able to understand them.

This is just one example. The same principle applies to Music, Sound Effects, Foley or anything that’s getting in the way of other sounds. Restrain each sound to a distinct frequency band. Give them their own space to breath. This is what I mean by vertical mixing. Using level and subtle EQ changes so that every sound at every moment has its own place sonically and isn’t clashing with other sounds. In musical terms it’s called being ‘in the pocket’. Everything just fits together nicely, creating a master composite sound through well-balanced volume level, frequency range and pan.


The Z-Axis or Depth

The third dimension is our z-axis. Which corresponds to depth of field in the visuals. We control this with reverb. This tells us how far away the sound is, as well as the space it’s in. This is especially important when you want to create the illusion of a different space, such as when shooting against green-screen.

In music production it’s common to mix different reverbs. For sound design you don’t want to do this – one space, one reverb. The balance of ‘dry’ (non-effected sound) to ‘wet’ (effected sound) will give us a sense of how distant we are from the sound source.

We can cheat this though. In a very wide shot of two characters walking through a vast cathedral, we can use close-up sound to hear them clearly, even though the camera’s POV is quite distant. It’s unrealistic, but it serves the story better. Remember: two things at once – what we see and what we hear.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: