Voice Over Studio tips

A Basic Guide – Wild Recording, To Picture, Source Connect, Split Guide Tracks and Stems.

By October 16, 2018 No Comments

Wild Recording, To Picture, Source Connect, Split Guide Tracks and Stems.


Some of these terms are simply gobble de gook to many folk working in media when they come into contact with the audio world.

So I’d like to try and give you a brief go-to when you’re about to take on the audio part of a project.

While audio manipulation has come on leaps and bounds over the last number of years with some amazing restoration software and applications that have the ability to separate voice from music these all come at a cost to your production.

Simply put, the better you record your audio the better your final product will sound – simple right?

So when you are looking to book a studio for a voice over, a narration or sound mix of a production what do you need to book?

Here is a quick view list.


Wild Recording:

A vocal recording from script with no video or backing tracks. It’s good to read the script through slowly from start to finish to ensure you have an understanding of how long your script will run.

You may want parts of the script needing to be timed to fit segments of your project and this can be done wild too. By reading each segment of your script evenly you can time each line, phrase or paragraph as required. You or your editor can then slot each recorded section into your project.


To Picture:

More often than not the recorded sections of voice will need to hit specific points in a video to punctuate the story. You can break the script down into lines and phrases to help the voice artist see the words on the page, as they need to be read.

For long form documentary projects it is best to provide time codes in the margin of the script at the start of each section that match the burnt in time code on the video. Burnt In Time Code (BITC) is only required for long form projects.


Split Guide Tracks:

Normally you will be expected to supply ‘Guide Tracks’ with the visuals. What are ideally needed are the music only, sound effects only and a guide dialogue only – set of audio tracks. Or an M&E track (music & effects) with a separate guide dialogue track. This way the engineer can feed the (M&E) backing tracks to the artist to help them get the mood of the required voicing while the engineer uses your guide dialogue track to time the new voice to.

Remember they are just guide tracks so keep any sound effects quite low in the M&E. The most important thing is to keep the guide dialogue separate.

Your audio tracks can be embedded into the supplied QuickTime. The sound studio DAW (possibly Protools or Logic) will separate the elements during the import process into their session.

Or, you can supply the guide audio tacks separately but cut to the exact length of the video.



For bigger projects the editors may supply an embedded OMF. This is normally done if the sound studio will be completing a full audio mix along with any voice-overs (VO).


ADR (Automated Dialogue Replacement) and Lip Sync:

This is similar to Picture work but involves actors and voice artists to sync their voice to the lips of the artist that can be seen on screen or adding lines when the lips are out of view. ADR is used to fix or replace actors lines of dialogue that need replacing possibly due to background noise on set spoiling the actors dialogue line or the line perhaps doesn’t have enough emotion.

You may want an individual in a tv commercial to have a different accent or for fun change a female to a male vocal for comedy effect. Naturally all animated productions will require voices to be dubbed to match the characters and bring them to life.

However be aware, even for a five second section which I had, where the scene cut mid dialogue onto a child that needed an adult voice dubbed over and lip synced for an anti smoking campaign can be tricky and thus may be charged at a higher rate than a simple Voice to Picture session.

ADR needs careful preparation prior to the session. For a small project it can be recorded quite easily. If it’s a feature film a professional team will get involved and that’s another blog for another day.


ISDN and Source Connect

These are products used to connect one audio facility to another audio facility with a broadcast quality audio link.

ISDN is a physical unit (codec) in each studio that hooks a microphone feed into the telephone system.

ISDN is due to be decommissioned by the telecoms providers here in the UK over the next few years so we have seen the likes of Source Connect among others being implemented by many sound studios.

Source Connect is software supplied by Source Elements using broadband connectivity between studios.

This is ideal for example if you are in London and your artist is in Newcastle or Hollywood. You book a local studio and a studio where the artist is based. The studios link up and you complete a session as if only a sheet of glass or two separates you both. These sessions can be wild or using video. You will need to ensure that both studios receive the scripts and all guide audio and video media in good time prior to the session.


There are many more parts I could add to this blog but to keep it from dragging on. Here are a few other parts you may need to know or may be helpful.


Sound Design:

Adding sound effects to create an environment or to punctate visual effects.


Sound Restoration:

If you get the recording wrong!

Using mostly iZotope RX among many other applications a good audio engineer can save your bacon. It will be expensive so get the recording right first time around.

You should also allow clean up time prior to the end of your VO session.

De-breathing and lip smack removal takes time. If you book an hour recording, allow enough time within that hour for the dialogue clean or, allocate a little extra time to be added on to the end of the session if this is needed. This may be added to the invoice remember.



Each part of your final audio mix is broken down into their individual parts.

Music (MX) Dialogue (DX) Effects (FX)

Background (BG) Foley (FO). The final two are normally for feature films.

Each stem when added together should sound exactly as the final mix.

Mix Minus Narration stem – non-dipped for documentary productions.

You should ask for these along with a final mix if you are intending to make foreign language versions of your project.

Many broadcasters and majors will expect you to provide these along with the final audio. You must check with them for a delivery spec.


Finally a word on R128 and TVC audio.

This is where your audio has to meet a legal requirement set out by the major broadcasters globally. R128 is beginning to show up in more than broadcasting now. Online suppliers of media are now starting to request their own versions of R128 standards. Ask your sound facility about what you may need depending on where your audio is going to be used.

TVC – Television Commercials. The video is set to an agreed length while the audio has to start 6 frames after the start of picture and 6 frames before the last frame of picture while adhering to the R128 rules set out by the county it is intended to be broadcast. You will need different audio mix levels for use in USA, UK and parts of Europe.


This is only a brief overview to help those who do not generally need to have a full technical background in audio but may need to converse and/or work with a post-production audio provider.