why the octave is divided into 12

(This post uses some math, but nothing beyond first-year college level.)

Sound is a continuum. Frequencies below 20 Hz are infrasound. Frequencies above 20,000 Hz are ultrasound. Both are inaudible.

Try it: drag the slider

⚠️ Do not increase the volume to try to hear frequencies you can't hear. You may damage your hearing.

0 Hz20 kHz

We divide this continuum into octaves, and octaves are divided into 12: C-C#-D-D#-E-F-F#-G-G#-A-A#-B. But why 12?

What if the octave is divided intoinstead?
440480523571622679740807880

To answer why 12, let's forget everything we know and pretend we are (smart musical) monkeys who want to layer and sequence pretty sounds so we can entertain ourselves by the campfire and impress our favourite monkey. We want to record and reproduce the sequences that sound good — the ones that don't scare away our favourite monkey. So, we need a way to store and reference the sounds we produce: we need a tuning system.

Here's the plan: we'll make a bunch of sounds, figure out which ones sound nice together, then we can give names to these sounds.

let's make sounds

We managed to get a hold of a string — we pulled a hair from a horse's tail without getting kicked. We hold it taut and ask our favourite monkey Koko to pluck it. It makes a sound.

We adjust our grip on the string. Koko plucks it again and it makes a different sound. Koko wants to know why.

Let's pretend we're very advanced monkeys who understand the basics of sound: soundwaves with different frequencies produce different sounds.

We tell Koko that when we adjust the length of the string, it vibrates at a different frequency, changing the pitch of the sound.

Koko is not satisfied with that answer. Koko is mathematically-inclined and wants to know exactly how pitch is related to frequency.

Okay. Let's say the full string vibrates at frequency ff. We'll shorten the string to 1/2, 1/3, 1/4, ... of its original length. Since length and frequency are inversely correlated, we're producing sounds at frequencies 2f,3f,4f,...2f, 3f, 4f, ...

Listen to how pitch changes with frequency:

Click on the frequency markers (1f, 2f, 3f...).

1f12f123f134f145f156f167f178f18220 Hz

Clearly, the higher the frequency, the higher the pitch. What else do we notice?

Koko points out that the intervals f2ff \to 2f, 2f4f2f \to 4f, and 4f8f4f \to 8f sound like equal steps to our ears. The intervals sound equidistant even though the frequency gap doubled each time: from 1 to 2 to 4.

Modelling pitch as some function of frequency P(f)P(f), we have:

P(2f)P(f)=ΔpP(4f)P(2f)=ΔpP(8f)P(4f)=ΔpP(2f) - P(f) = Δp\\ P(4f) - P(2f) = Δp\\ P(8f) - P(4f) = Δp

where ΔpΔp is the pitch interval.

We notice that for any ratio rr, P(rf)P(f)P(rf) - P(f) is a constant for all ff.
Δp\Delta p is a function of rr:

P(rf)P(f)=g(r)P(rf) - P(f) = g(r)

This means PP must be logarithmic:

log(rf)log(f)=log(r)\log(rf) - \log(f) = \log(r)

In the above example, we have r=2r=2, so Δp=log(2)Δp = \log(2). We've discovered that pitch is logarithmically dependent on frequency.

For fun, we play f,2f,3f,...f, 2f, 3f, ... in succession. We realize that the required string lengths are inputs to the harmonic series:

n=11n\sum_{n=1}^\infty \frac{1}{n}

We tell Koko that this is what the harmonic series sounds like in music:

Koko is intrigued, but not yet impressed. Koko wants to hear them layered. This request demands another horse tail hair.

We narrowly avoid getting kicked by the horse and obtain a second string, equal in length and thickness to the first. Now we can play two notes at once.

let's figure out what sounds good

Try adjusting the frequency on both strings.

Which frequencies sound good together?

1f2f3f4f5f6f7f8f1f1f2f3f4f5f6f7f8f2f

It seems like simple ratios create consonant intervals, and complex ratios create dissonant intervals. Now what? What can we do with this information?

Koko reminds us of our promise: we wanted to construct a tuning system, so we can record and recreate beautiful sequences of sounds to enjoy together by the campfire. Right. What have we found so far?

Earlier we discovered that doubling the frequency gives us the same note in a higher register. That means we have these repeating units? we can divide the continuum into: [1f,2f),[2f,4f),[4f,8f)...[1f, 2f), [2f, 4f), [4f, 8f)...

We'll focus on just one of these units: the interval between 1f1f and 2f2f. We'll find notes we want to play within the interval [1f,2f)[1f, 2f). Those notes will comprise our tuning system.

let's find those notes

Starting from 1f1f, we want to get the most consonant sounds, so let's apply the simplest ratios to it.

2:1 gives us 2f2f, which is the start of the next interval.

3:1 gives us 3f3f, which is again, outside our interval, but we can divide it by 2 to bring it back in bound. Now we have our second note — it has frequency (3/2)f(3/2)f.

Now we can keep going and try bringing 4f,5f,6f,...4f, 5f, 6f, ... into [1f,2f)[1f, 2f), but this approach will give us many of the same notes. E.g. 4f4f will need to be divided by 4, and that gives us 1f1f again. 6f6f will also need to be divided by 4, giving us (3/2)f(3/2)f again.

Let's try a different approach. From 1f1f, we went up to (3/2)f(3/2)f. What if we wanted to play that interval, starting from (3/2)f(3/2)f? What if we kept going up by 3/23/2 and dividing by powers of 2 to bring the note back inside [1f,2f)[1f, 2f)? Once we return to 1f1f, we'll have exhausted all possible notes, right? Let's try that:

(3/2)^0 = 1  
(3/2)^1 = 1.5  
(3/2)^2 = 1.125    (after dividing by 2)  
(3/2)^3 = 1.6875   (after dividing by 2)  
(3/2)^4 = 1.2656   (after dividing by 4)  
(3/2)^5 = 1.8984   (after dividing by 4)  
(3/2)^6 = 1.4238   (after dividing by 8)  
(3/2)^7 = 1.0678   (after dividing by 16)  

We've returned to a number pretty close to 1, but it doesn't seem like we'll ever return to 1 and complete the cycle. Let's verify this.

If it were possible, we'd have

1(3/2)n(1/2)k=1{1}\cdot(3/2)^n(1/2)^k = 1

which simplifies to 3n=2n+k3^n = 2^{n+k}. LHS is odd, RHS is even — we have a contradiction. It's not possible to return to our starting frequency.

We currently have 7 notes in between:

121.50001.12501.68751.26561.89841.42381.0678

The points seem a bit sparse. Let's keep going:

(3/2)^0 = 1  
(3/2)^1 = 1.5  
(3/2)^2 = 1.125    (after dividing by 2)  
(3/2)^3 = 1.6875   (after dividing by 2)  
(3/2)^4 = 1.2656   (after dividing by 4)  
(3/2)^5 = 1.8984   (after dividing by 4)  
(3/2)^6 = 1.4238   (after dividing by 8)  
(3/2)^7 = 1.0678   (after dividing by 16)  
(3/2)^8 = 1.6018   (after dividing by 16)  
(3/2)^9 = 1.2013   (after dividing by 32)  
(3/2)^10 = 1.8020  (after dividing by 32)  
(3/2)^11 = 1.3515  (after dividing by 64)  
(3/2)^12 = 1.0136  (after dividing by 128)  

After 12 iterations, we arrive at 1.0136 — much closer to the starting frequency.

121.50001.12501.68751.26561.89841.42381.06781.60181.20131.80201.35151.0136

This seems like a good point to stop — we now have pretty good coverage of the interval. If we kept going until 24, it would look like this:

121.50001.12501.68751.26561.89841.42381.06781.60181.20131.80201.35151.01361.52041.14031.71051.28291.92431.44321.08241.62361.21771.82661.36991.0025

This is no good. The new additions seem to be a repetition of the first 12 notes, just with a small shift to the right. This confirms that 12 is the optimal number to divide our interval into.

We assign letters to the first 7 notes: A, B, C, D, E, F, G. The 5 that come after are in between A \to B, C \to D, D \to E, F \to G, and G \to A. To refer to those notes, we append the \sharp symbol to the former of each pair, or \flat to the latter.

Great. We now have ourselves a tuning system composed of 12 notes per unit. Koko is finally impressed.


Okay. Enough pretending to be smart monkeys. Serious talk from here on.

The monkeys we imagined ourselves to be are geniuses. The logic mirrors how humans really discovered these relationships — though it took centuries, not one afternoon by the campfire. The ancient Greeks, medieval monks, and Renaissance composers all grappled with these same mathematical puzzles.

We've built a tuning system from the ground up and discovered that it is optimal to divide an octave into 12. But the system we use today has 12 equal divisions. How did we get there?

let's explore tuning systems

The system we built is referred to as the Pythagorean tuning system?. It was used in medieval times, when thirds were considered dissonant.

Other than the obvious flaw (can't return to 1 to close the circle), thirds sound horrible in this system. The ratio for thirds is supposed to be 5/4. But this tuning gives us a complex ratio: 81/64?.

Is there a better way? Instead of only using fifths (3/2) as building blocks, what if we also used the fourth and the third, and combined them to construct intervals that aren't in the harmonic series?

E.g. let's start with C. We add the following lower-order harmonics to our system:

  • Perfect fifth: 3:2 → G
  • Perfect fourth: 4:3 → F
  • Major third: 5:4 → E

We could then combine these intervals by stacking them:

  • (major third) + (perfect fourth) = major sixth
    • (5/4)(4/3)=5/3(5/4)(4/3) = 5/3 → A
  • (major third) + (perfect fifth) = major seventh
    • (5/4)(3/2)=15/8(5/4)(3/2) = 15/8 → B
  • (perfect fifth) - (major third) = minor third
    • (3/2)/(5/4)=6/5(3/2) / (5/4) = 6/5 → E
  • (perfect fifth) + (perfect fifth) - octave = major second
    • (3/2)2(1/2)=9/8(3/2)^2 (1/2) = 9/8 → D
  • and so on

The system constructed in this manner has pure tones, but modulating becomes a nightmare — each key needs retuned ratios. The F# in A major is different from the F# in D major.

This system — Just Intonation — emerged during the Renaissance:

121.00001.06671.12501.20001.25001.33331.40631.50001.60001.66671.80001.8750

Shortly after, the meantone temperament came about. It's similar to the Pythagorean system: instead of sacrificing pure thirds for pure fifths, meantone sacrifices pure fifths for pure thirds:

121.00001.04491.11801.19631.25001.33751.40951.50001.67191.78891.86921.9634

In a fixed-pitch 12-note system, it is mathematically impossible to have both pure thirds and pure fifths. This is a prime factorization incompatibility.

Since we can't have everything, we gave up everything — the tuning system we use nowadays doesn't have a single pure ratio. Since we divided the octave into 12 for convenience, every note is slightly off-tune. Perfect pitch is really just perfect memory of imperfectly tuned notes.

121.00001.05951.12251.18921.25991.33481.41421.49831.58741.68181.78181.8877

This system is called 12TET, or 12 tone equal temperament.

12-TET vs Just Intonation

Click keys to hear the difference. The 12-TET keyboard shows deviations from pure Just Intonation ratios.

Just Intonation (Pure Ratios)
12-Tone Equal Temperament

In the 1800s, mass production of pianos demanded standardization. 12TET offered a reprieve from choosing between pure fifths, pure thirds, and modulatory freedom.

Musicians could modulate freely, piano makers could build one instrument for all repertoire, and the slight deviations from pure tones stayed below perceptual thresholds... for the most part:

This is why Jacob Collier's harmonies sound out of this world.

--

12TET is close enough to perfection. Math demanded compromise, culture chose convenience, network effects locked in the standard. If we demanded perfection, guitars would have squiggly frets.