why the octave is divided into 12

The octave is divided into 12: C-C#-D-D#-E-F-F#-G-G#-A-A#-B. Why 12?

To answer this, let's forget everything we know and construct a tuning system from the ground up. Our goal is to take the continuum of sounds, pick out pitches that sound good together, store them, and give them names, so that we can reference them easily to make music.

But how many should we pick out? Let's find out — we'll start with a note that has frequency ff, and find notes that pair well with it.

okay, but what makes something sound good?

Pitch is determined by the frequency of the soundwave. So, our goal here is to identify soundwaves that sound good when they're stacked.

Let's visualize the waveforms for different frequencies: f,2f,3f,...f, 2f, 3f, ....

Harmonic relationships visualized

Pitch is logarithmically dependent on frequency. The second wave, with twice the first wave's frequency, is an octave higher than the first. You can verify this on a guitar?.

We notice that the first four intervals are nice-sounding ones. The bottom two are not. Is there a pattern here? What do we observe about the relationship between frequencies and harmonies? It seems that:

  • simple ratios     \iff consonant intervals
  • complex ratios     \iff dissonant intervals

E.g. if we were to stack the last and third to last waves, we get a tritone. The irregular beating of the resulting composite wave is what makes it sound bad.

This explains why old guitar strings don't sound as good as new ones — the imperfections on the strings cause irregularities in the sound waves.

In a way, harmony illustrates our preference for simplicity.

harmonic series

Notice that the sum of those ratios is the harmonic series:

n=11n\sum_{n=1}^\infty \frac{1}{n}

In music, the harmonic series is a sequence of notes that sounds like this:

You're hearing the waveforms above, played in succession: base note, note with 2x that frequency, note with 3x that frequency, and so on.

A note with n x the base note's frequency is called the n-th harmonic, or the (n-1)th overtone.

Except in a true sine wave, harmonics are present in any pitch. Any time you play a note on an instrument, you hear not only the base note, but also the harmonics.

These harmonics are the reason the same note sounds different across instruments. Their magnitudes determine the timbre of each instrument.

Stripping away the harmonics from a C on a violin and a C on a clarinet, we get the same fundamental sound.

quick recap

  • frequency determines pitch
  • frequency ratio determines harmony: simple sounds good
  • harmonics determine timbre

let's find more notes

Let's start with f=440 Hzf = 440 \text{ Hz}, which is what we call A. We want to find notes within the octave, i.e. f[440,880]f \in [440, 880]

What pairs well with our current note? The most consonant interval is the octave, but that doesn't help us find a new note to add. What's the next most consonant interval? The fifth.

Let's go up a fifth: 440 x 3/2 = 660. Now we have our second note.

From this note, we want to be able to play a fifth as well. So, why don't we try stacking fifths and see what happens?

We go up another fifth to get f = 660 x 3/2 = 990. This is outside our octave, so we divide by 2 to get f = 495.

Now we repeat:

  • Go up a fifth to get 495 x 3/2 = 742.5.
  • Go up a fifth and down an octave to get 742.5 x 3/2 / 2 = 556.875.
  • ...

Once we return to f=440f=440, we'll have exhausted all possible notes, right? Does that happen?

Let nn = number of times we went up a fifth, and kk = number of times we went down an octave. If we get back to 440, that means we have

440(3/2)n(1/2)k=440{440}\cdot(3/2)^n(1/2)^k = 440

which simplifies to 3n=2n+k3^n = 2^{n+k}. This is a contradiction: LHS is odd, RHS is even.

So, we won't ever return to our starting frequency. Now what? How close do we get?

(3/2)^0 = 1  
(3/2)^1 = 1.5  
(3/2)^2 = 1.125    (after dividing by 2)  
(3/2)^3 = 1.6875   (after dividing by 2)  
(3/2)^4 = 1.2656   (after dividing by 4)  
(3/2)^5 = 1.8984   (after dividing by 4)  
(3/2)^6 = 1.4238   (after dividing by 8)  
(3/2)^7 = 1.0678   (after dividing by 16)  
(3/2)^8 = 1.6018   (after dividing by 16)  
(3/2)^9 = 1.2013   (after dividing by 32)  
(3/2)^10 = 1.8020  (after dividing by 32)  
(3/2)^11 = 1.3515  (after dividing by 64)  
(3/2)^12 = 1.0136  (after dividing by 128)  

After 12 applications of the ratios 3/2, we come back pretty close to where we started. Since 1.0136 is relatively close to 1, we stop after 12 iterations.

This kind of answers our original question — we now have 12 notes in our tuning system. But there are problems with this system.

let's explore tuning systems

Other than the obvious flaw (can't return to 1 to close the circle), thirds sound horrible in this system. As we saw earlier, the ratio for thirds is supposed to be 5/4. But this tuning gives us a complex ratio: 81/64?.

This system is referred to as Pythagorean tuning?. It was used in medieval times, when thirds were considered dissonant.

Is there a better way? Instead of only using fifths as building blocks, what if we also used the fourth and the third, and combined them to construct intervals that aren't in the harmonic series?

E.g. let's start with C. We add the following lower-order harmonics to our system:

  • Perfect fifth: 3:2 → G
  • Perfect fourth: 4:3 → F
  • Major third: 5:4 → E

We could then combine these intervals by stacking them:

  • (perfect fifth) + (perfect fifth) = major second
    • (3/2)^2 (1/2) = 9/8 → D
  • (perfect fifth) - (major third) = minor third
    • (3/2) / (5/4) = 6/5 → Eb
  • (major third) + (perfect fourth) = major sixth
    • (5/4)(4/3) = 5/3 → A
  • (major third) + (perfect fifth) = major seventh
    • (5/4)(3/2) = 15/8 → B
  • and so on

The system constructed in this manner has pure tones, but modulating becomes a nightmare — each key needs retuned ratios. The F# in A major is different from the F# in D major.

This system — Just Intonation — emerged during the Renaissance. Shortly after, the meantone temperament came about. It's similar to the Pythagorean system: instead of sacrificing pure thirds for pure fifths, meantone sacrifices pure fifths for pure thirds.

In a fixed-pitch 12-note system, it is mathematically impossible to have both pure thirds and pure fifths. This is a prime factorization incompatibilitiy.

Since we can't have everything, we gave up everything — the tuning system we use nowadays doesn't have a single pure ratio. Technically, every note is slightly off-tune. Perfect pitch is really just perfect memory of imperfectly tuned notes.

Below is the harmonic series. The numbers represent the cents? needed to return to natural harmonics.

12 tone equal temperament

In the 1800s, mass production of pianos demanded standardization. By dividing the octave evenly into 12, 12-tone equal temperament offered a reprieve from choosing between pure fifths, pure thirds, and modulatory freedom.

Musicians could modulate freely, piano makers could build one instrument for all repertoire, and the slight deviations from pure tones stayed below perceptual thresholds — for the most part. 👈 why Jacob Collier's harmonies sound out of this world.

Although 12 equal increments isn't perfect, it's close enough. Math demanded compromise; culture chose convenience. Network effects locked in the standard. Without equal temperament, jazz wouldn't be possible!

Here's a parting gift: guitars with squiggly frets