My lectures usually start with trying to take a historic perspective on the subject matter. I’m not changing my ways here but the ‘history’ is likely to merge in with the opinion and the practical aspects.
So what makes a good sound?
You will spot straight away that this question doesn’t have a simple answer; in fact I’m sure it can’t be answered at all…. What do we mean by ‘good’?
I want to treat ‘good’ as being ‘pleasant’, ‘natural’ and ‘acceptable’.
Our ears are used to hearing sounds in the natural world; particularly human sounds and speech.
When we record those sounds and play them back, we want them to sound either as natural as possible, or like a ‘nicer version’ of the original sound.
So, to be practical, let’s look at the recording process and see what happens to the sound, see what is important and what’s not.
A recording of speech needs first a microphone, and that mic has to be placed a distance away from the mouth of the speaker so that the speech will sound natural….. immediately we are faced with a compromise because of the way our hearing works: We are used to hearing speech at a distance of about 2 metres, but if we try to record at that distance, we start to pick up too many sound reflections from walls and other objects in the room, so to compromise, the mic needs to be closer. So we chose a distance of about 20 cms. That keeps the speech louder, cuts down on outside noise, but also changes the character of the speech because of the very close proximity; it tends to make the low frequencies more pronounced, but that’s not too serious and certainly not unpleasant.
We have assumed that the microphone is a good one; say, a TFPRO GP, a large diaphragm capacitor mic with a good extended response.
That speech signal from the speaker is being converted into very small electrical signals that go to a mic preamplifier. This is needed to get the signal high enough to feed into the recording system, whatever that may be, for the moment we will say it’s a computer sound card and a lump of ‘flash’ memory.
It’s here at the mic preamp that we hit the first really critical problem area in the system. The vast majority of microphone amplifiers nowadays (including those called ‘valve’) make use of so-called specially designed low noise audio amplifier ICs. If you are an engineer you may well look up the ‘recommended application circuit’ for an IC, together with its specification, and, because the manufacturer dearly wants you to use his chip, you will be very happy with the simplicity of the application circuit, and will marvel at the low noise, the low distortion and the general performance….. but sadly, there’s a whole lot that IC manufacturer doesn’t tell you; in fact, there’s a whole lot that even he doesn’t actually know about his own product.
If we could use the IC in the way that the manufacturer intended, then it would perform wonderfully, of that there is no doubt. But the real world is not quite like the test lab! Those electrical signals coming from the microphone are at a volume level of say -60dB (this is a very loose way of talking, but it’s best to try to keep it simple!) which means that the ‘operating level’ of the signal is at -60dB. So to get the signal up to an operating level of -4dB, which would be reasonably sensible for a digital recording, one would expect to need to set the mic preamp to have a gain of 56dB. Fine, that gives us the right working level. BUT there’s a problem, although the speech from the microphone is at -60dB, the sound of the ‘esses’ and ‘tees’ produce momentary voltages which would relate to a peak level of nearer -25dB, or even higher. These transient peaks don’t sound loud because they are of such short duration; the ‘T’ and ‘K’ consonants are the obvious worst offenders.
You might say ‘but if you don’t notice the peaks, then why not clip them off?’ And that’s a perfectly good idea, and in practice that’s what happens… but we are moving too fast; we have a transient of -25dB and the gain of the amplifier is set to 56dB. That means that the transient is going to come out of the preamp at a level of 56dB above -25dB, which is +31dB! Now, even ignoring the fact that our recording system could not handle anything like that level, the preamp will have given up trying at about +22dB; it has ‘clipped’, reaching the maximum output it is capable of, and this is where the IC designer aught to be learning a lesson, when the IC is operating within its limits, the distortions are extremely low, but if the IC hits ‘clip’, technically the signal hits the rail and negative feedback ceases to work. Again that’s fine, but the problem comes in the few microseconds after the overload; the IC is struggling to get back to normal and in doing so causes very short term distortions to the audio that the IC designer never sees, and he would ignore anyway because ‘it’s outside the specified operational conditions!
So to sum it all up, two transients happen when our speaker says ‘cat’, the transients are so short that we don’t hear them, but the mic amp momentarily clips and the normal parts of the speech immediately following the transient are distorted; only for a moment, but it is enough to be noticeable as something ‘not right’.
This instability induced distortion is subtle in its effect but is one of the main reasons for the difference in ‘warmth’ of sound between different mic preamps.
Once the audio signal has got out of the mic preamp, whether we like it or not it’s dynamic range has been reduced, because those annoying transients have been clipped off! So now the ‘quality’ of the sound is affected almost totally by straightforward harmonic distortion, and frequency response or bandwidth.
When we talk about ‘distortion’, engineers mean harmonic distortion, the way an amplifier subtly alters the shape of an audio signal. In nature (and in physical musical instruments) all sounds could be called ‘distortions’ of pure waves, and all these distortions affect the frequency in a ‘one sided’ way; that is, the top of the wave is squashed more than the bottom, or vice-versa. This type of distortion adds harmonic content to the sound in a musically pleasing way, it is called ‘even order distortion’.
In integrated circuit amplifiers the arrangement of components is symmetrical for the top of the wave and the bottom of the wave. This means that any distortion that occurs is also symmetrical, but this is not a natural sound! The effect on the ear of this 3 rd harmonic, or ‘odd order’ distortion is extremely unpleasant and very slight amounts of it can be heard. When there is distortion just discernable, the sound starts to be ‘brittle’ and ragged. Higher levels of distortion sound as though the signal has been treated with a cheese grater.
But let’s assume that we have used a really fine mic amp and we have minimised all those distortions, how is the signal sounding?
We are taught that we hear frequencies between 20Hz and 20KHz. We are taught wrong!
Those figures are a comfortable compromise but don’t really mean a lot. At the low frequency end we can recognise with our ears frequencies down to about 30Hz although there is no clear cut-off. More important is the fact that we ‘experience’ low frequencies from about 100Hz downwards with the whole of our bodies, so frequencies as low as say 8Hz are important if we are trying to achieve realism.
At the other end of the spectrum, we can hear up to around 12KHz clearly, although this depends a lot on how many beers you drank the night before! At 12 KHz the sound is audible, but the ‘quality’ of the sound is not discernible. Above 12KHz we can tell that there is something there, right up to 20KHz and even beyond.
In the mid ranges the ear is amazingly sensitive to the tiniest effects of distortion or phase shifts, particularly around the 3KHz region, and incidentally, this is where a great deal of directional information is used by the brain: A signal source that is off to one side can be pin-pointed because of the minute time difference in the sound reaching the two ears…. I did some experiments on stereo direction sensing back in the mid 80s and proved conclusively that it is actually phase (in the form of time difference) that gives us most of our direction sensing ability.
In short, The ear is fantastically sensitive to being fooled in the mid ranges, that is, say 500Hz up to 3.5KHz. Outside those frequencies it’s less particular, but there are some odd effects to do with extremes of bandwidth.
It’s an interesting experiment to find out what happens to perception when bandwidth is restricted. Listening to any sort of speech or music if the bandwidth is restricted at the bottom end or the top end, it is instantly recognised. However, if the bandwidth is restricted in a controlled way, removing low frequencies and high frequencies equally, then the result to the ear is remarkably acceptable even when the restriction is severe.
This effect has been very useful in the past of course…. In the cinema, the bandwidth of optical film was about 150Hz up to about 4.5KHz, and ‘lumpy’ at that, but it still sounds acceptable because the signal was filtered top and bottom (using the ‘Academy’ filter).
So what would be a practical bandwidth for digital recording? It depends on what you are trying to achieve; I think it’s possible to set the bandwidth to suit the programme… There is no point in extending the bandwidth below say 30Hz if the end product is likely to be listened to on small radios or in cars. The problems are not only in what’s possible to hear, but also a physical problem with loudspeakers.
The distance a loudspeaker moves is inversely proportional to the frequency.
So if the cone moves a distance of 0.1mm at 1KHz, then at 100Hz it will move 1mm, and at 10Hz it will move 10mm! So extended frequency responses may sound attractive, but they are not very practical.
If the bandwidth is limited to 30Hz at the bottom end, then the HF end should also be limited to about 14KHz so that the sound remains ‘balanced’.
Equaliser; Friend or Foe?
I have talked about quality and distortion and made some comments about bandwidth, so how do we ‘engineer’ or alter the sound of our recording? The first device to spring to mind is the equaliser.
Equalisers came from the film industry; the very name gives the game away, the purpose of the equipment was to ‘equalise’ the sound between various takes where the microphone may have been different distances from the person speaking. The first equalisers were simple tone controls where the sound engineer could alter the ‘weight’ of the recording, but almost immediately, engineers recognised that the most useful sort of modification to the sound was some sort of ‘lift’ in the upper mid frequency band. They found that by altering the intensity of this area of sound they could achieve an element of ‘distance’ effect.
The first commercial equalisers were of the ‘Pultec’ type; these were passive circuits that bent the response, followed by an amplifier to make up the gain that was lost in the process.
They were basic circuits dealing with two selectable frequencies. You had the choice of boost or cut on each frequency, and a ‘bandwidth’ control that operated on both frequencies; but it was all a little ‘hit and miss’. In reality, a great variety of sounds was possible by combining boost and cut at a single frequency, what this does is to introduce variations in the phase response which has profound effects on the sound.
Equalisers as we know them have their roots in Hi Fi tone controls from the 1950s and 60s. In the mid 50s an engineer working for EMI at Hayes just outside London, a man called Peter Baxendall, had an idea for a circuit type that had some supreme advantages over anything that had been around before; it was a circuit that could lift or cut frequency bands, and apply phase shifts that were very close to those occurring in nature. Because the circuit worked by modifying electronic feedback, it was very controllable and did not need ‘make-up’ amplifiers.
The basic ‘Baxendall’ equaliser is simply shelving HF and LF curves controlled by two rotary controls. Both the LF and HF have little or no effect at the centre frequency (normally around 1KHz) but have increasing effect out towards maxima at about 100Hz and 10KHz. There are many variations on the ‘Baxendall’ circuit, including mid frequency boost and cut sections. This is probably the best and most natural sounding circuit configuration for an EQ.
More violent and much better on paper are the ‘state variable filter’ equalisers. These are based on clever filter electronics where it is possible to tune circuits to precise frequencies and vary the ‘Q’ (quality factor) of the peak. This sounds fine in theory, but in nature there is no sound that contains high ‘Q’ values; anything modified with these circuits runs a high risk of sounding awful because of the confusing effect on the ears.
Similar things can be said about many of the ‘gyrator’ type filter circuits that abound in so called ‘professional’ mixers and equipment. They can be made to operate at high ‘Q’ values, but by doing this, the phase relationships in the signal are sharply altered and the ear is easily confused.
But some form of EQ is necessary to pull instruments forward in a mix, and to generally tailor the sound both in mixing and mastering. For general purposes, where just a little alteration is needed, then the simple ‘Baxendall’ type is by far the best, and most mixer channels are fitted with these, but where there are tuneable mids, beware of overuse!
In mastering a much more specialised form of EQ is required; one where one can control bandwidth, and produce shifts in tonal balance while being sure of retaining phase relationships, and so a clean and transparent sound.
My own P9 EQ is designed precisely for this purpose, with ‘Baxendall’ based HF and LF controls, but with the addition of selectable frequencies, and two mid frequency sections where the peak is determined by a passive inductor/capacitor combination. To make the whole process repeatable, every control is a calibrated switch.
At the risk of sounding really boring, I have to repeat what you must have heard many times before; don’t use an equaliser unless you have a clear idea of what you are doing or trying to achieve.
The TFPRO P9 ‘Ted’s definitive equaliser’.
Using an equaliser is distorting the audio signal, it’s changing its spectrum, altering the relative loudnesses. Gentle use of equalisation will pull sounds forward or push them away. Any more violent use will cause confusion to the ear and destroy the illusion that the brain has created. It’s almost like looking at one of those three-dimensional computer pictures that you have to train your eyes to see; once you can see the hidden image it is as clear as anything, but if anything disturbs you, the image collapses and your eyes see chaos.
Of course there’s a place for more extreme equalisation, it’s in effects where you will be trying to get a particular guitar sound or make the piano sound even more striking, and yet isn’t it strange how often you think you can achieve an effect with ‘just a bit more EQ’ yet when it comes to it, it really doesn’t work, all you achieve is brown mush or grating distractions.
No, the best engineers are very sparing with the EQ, they use very small amounts of ’tilt’ to achieve positioning and detail, but they rely heavily in microphones, microphone placing and good performance to get good recordings.
(INVITE QUESTIONS…. GENERAL DISCUSSION ABOUT USE OF EQ)
(RECORD The Ragpicker’s Dream Track 10)
(GENERAL PRODUCTION TOPICS….)
During the mixing process, the three factors which most affect the placing of the sound in a mix are RELATIVE LEVELS, RELATIVE FREQUENCIES, and COMPRESSION.
The RELATIVE LEVELS are taken care of, obviously, in the mixing process, with faders.
The RELATIVE FREQUENCIES can be adjusted to some extent with EQ.
So we come to the most neglected factor, COMPRESSION.
Now, we are not talking about ‘compression’ as a method of reducing dynamic range here. That’s a function of compression where the aim is to make the effect as transparent as possible; it’s just to fit the signal into the available dynamic range on the medium you’re working with. Very briefly, this is best achieved nowadays by some very clever computer algorithms that operate independently on different frequency bands, yet are timed cleverly so that they are not noticeable (except for their dreadful misuse on Radio 2, where the voices of the presenters make me cringe).
What we are talking about is three areas of compression;
- Compression of individual signals; instruments or voices.
- Group compression.
- Overall compression.
I’m going to be just a little controversial here; most recording engineers try to record the human voice completely ‘flat’ (with no effects) so that any required effects can be added later during the mixing process. When I record a solo voice, I usually use a little compression. The reason is that to make a voice audible in a mix, some compression is always necessary, but the real reason is probably that until very recently it was not possible to record the true dynamic range of a voice at all!
The purpose of individual signal compression is to stabilise their position in the mix.
If you have read any of my writings about compression, I generally go on about the automatic biological compression effects of the ear, how our ears ‘turn down’ loud sounds. My compressor designs try to imitate the time constants of the ear because when we apply this sort of compression to a signal, whether it be instrument or voice, we are fooled into thinking that it’s louder than it really is.
In practice this works really well, and my own tests demonstrate (to me at least!) that numbers of signals, individually compressed can be mixed without causing confusion.
Of course, individual signal compression is almost always necessary for an artistic reason as well; it’s needed to control the dynamic range of the loud parts and to make the quiet parts useable. That may sound obvious, but it’s still essential for all but the most ‘purist’ recordings.
When Joe Meek recorded lead vocals, his method was to drive the output of his microphone amplifier directly into a compressor, and from there, into his mixer, and then on to tape. This worked well and I have always done the same, in fact, that’s the basis for the various ‘Voice Channels’ that I have designed over the years.
The P10 ‘The MIGHTY TWIN’
To digress for a moment, the latest ‘Voice Channel’ that’s just finishing development will be a combination of mic amp, compression, with limiter function and with an equaliser. It will be a transformer linked mic preamp followed by an asymmetric compressor, that takes into account the way the human voice is very non-symmetrical, and then the EQ.
An interesting feature is the ‘VARIABLE PHASE’ facility; It is there to be able to ‘tune’ the phase of microphones that are close together, where there might be cancellation problems…. Particularly when recording drums.
So the mic amp has a particularly huge overload margin so that high output mics close to the kit will be handled OK.
(ASYMMETRIC COMPRESSION AD LIB)
I have to suggest that it’s generally not such a good idea to use compression on individual musical instruments, except possibly guitars in a pop music context.
Some compression is often necessary (for dynamics reasons!) when recording brass sections, but it’s easy to get quite horrid results using compression on strings or most reed instruments. You can hear some early mistakes in this respect on records of the 1970s where the producer tried to use the Mellotron as an instrument, the results were uniformly horrible, and I think mainly because of the necessary heavy compression used to ‘smooth out’ the dynamics of the instruments.
The most obvious and most effective use of compression is where you group together a number of sound sources into a sub-group, and apply compression to the whole group at one go.
The drum sub-group is the one that springs to mind straight away, and this is where creative musical compression comes into its own.
It’s always a good thing to have a drum kit sounding louder than it actually is! I’m tempted to say that it’s difficult to overcompress most sorts of drums on pop records; I’ve seen them compressed to the most amazing extremes, and definitely the best sort of compressor for this is the slow photoelectric like the Urie LA2A or the old Joemeek SC2 or its modern equivalent.
A fairly slow attack time and a shortish release would sound too ‘breathy’ on most things, but on drums it can give depth and urgency, and hold them stable so that they become the backbone of the mix.
In my compressor designs I always tailor the sidechain response so that the compression is less sensitive to low frequencies. What this means is that you can compress the whole kit with some extreme compression without worrying about the bass drum causing ‘wallowing’ , in fact you can even include the bass guitar within the same sub-group, making the ‘bass and drums’ really tight.
You could write in the same characteristics to a software compressor, but I can absolutely guarantee that it won’t sound the same!
Group compression is also very useful with vocal group recording. Here again one can use some quite severe compression, but the release time is more critical because of how we hear the human voice; any sort of ‘flutter’ as the compression acts, can sound very unnatural. The classics for a demonstration of the use of EFFECT compression on voices are the Beach Boys, and even more so, Queen.
Copyright Ted Fletcher 2005