why the octave is divided into 12

(This post uses some math, but nothing beyond first-year college level.)

Sound is a continuum. Frequencies below 20 Hz are infrasound. Frequencies above 20,000 Hz are ultrasound. Both are inaudible.

Try it: drag the slider

⚠️ Do not increase the volume to try to hear frequencies you can't hear. You may damage your hearing.

We divide this continuum into octaves, and octaves are divided into 12: C-C#-D-D#-E-F-F#-G-G#-A-A#-B. But why 12?

What if the octave is divided intoinstead?

To answer why 12, let's forget everything we know and pretend we are (smart musical) monkeys who want to layer and sequence pretty sounds so we can entertain ourselves by the campfire and impress our favourite monkey. We want to record and reproduce the sequences that sound good — the ones that don't scare away our favourite monkey. So, we need a way to store and reference the sounds we produce: we need a tuning system.

Here's the plan: we'll make a bunch of sounds, figure out which ones sound nice together, then we can give names to these sounds.

let's make sounds

We managed to get a hold of a string — we pulled a hair from a horse's tail without getting kicked. We hold it taut and ask our favourite monkey Koko to pluck it. It makes a sound.

We adjust our grip on the string. Koko plucks it again and it makes a different sound. Koko wants to know why.

Let's pretend we're very advanced monkeys who understand the basics of sound: soundwaves with different frequencies produce different sounds.

We tell Koko that when we adjust the length of the string, it vibrates at a different frequency, changing the pitch of the sound.

Koko is not satisfied with that answer. Koko is mathematically-inclined and wants to know exactly how pitch is related to frequency.

Okay. Let's say the full string vibrates at frequency $f$ . We'll shorten the string to 1/2, 1/3, 1/4, ... of its original length. Since length and frequency are inversely correlated, we're producing sounds at frequencies $2f, 3f, 4f, ...$

Listen to how pitch changes with frequency:

Click on the frequency markers (1f, 2f, 3f...).

Clearly, the higher the frequency, the higher the pitch. What else do we notice?

Koko points out that the intervals $f \to 2f$ , $2f \to 4f$ , and $4f \to 8f$ sound like equal steps to our ears. The intervals sound equidistant even though the frequency gap doubled each time: from 1 to 2 to 4.

Modelling pitch as some function of frequency $P(f)$ , we have:

P(2f) - P(f) = Δp\\ P(4f) - P(2f) = Δp\\ P(8f) - P(4f) = Δp

where $Δp$ is the pitch interval.

We notice that for any ratio $r$ , $P(rf) - P(f)$ is a constant for all $f$ .
$\Delta p$ is a function of $r$ :

P(rf) - P(f) = g(r)

This means $P$ must be logarithmic:

\log(rf) - \log(f) = \log(r)

In the above example, we have $r=2$ , so $Δp = \log(2)$ . We've discovered that pitch is logarithmically dependent on frequency.

For fun, we play $f, 2f, 3f, ...$ in succession. We realize that the required string lengths are inputs to the harmonic series:

\sum_{n=1}^\infty \frac{1}{n}

We tell Koko that this is what the harmonic series sounds like in music:

Koko is intrigued, but not yet impressed. Koko wants to hear them layered. This request demands another horse tail hair.

We narrowly avoid getting kicked by the horse and obtain a second string, equal in length and thickness to the first. Now we can play two notes at once.

let's figure out what sounds good

Try adjusting the frequency on both strings.

Which frequencies sound good together?

It seems like simple ratios create consonant intervals, and complex ratios create dissonant intervals. Now what? What can we do with this information?

Koko reminds us of our promise: we wanted to construct a tuning system, so we can record and recreate beautiful sequences of sounds to enjoy together by the campfire. Right. What have we found so far?

Earlier we discovered that doubling the frequency gives us the same note in a higher register. That means we have these repeating units^? we can divide the continuum into: $[1f, 2f), [2f, 4f), [4f, 8f)...$

We'll focus on just one of these units: the interval between $1f$ and $2f$ . We'll find notes we want to play within the interval $[1f, 2f)$ . Those notes will comprise our tuning system.

let's find those notes

Starting from $1f$ , we want to get the most consonant sounds, so let's apply the simplest ratios to it.

2:1 gives us $2f$ , which is the start of the next interval.

3:1 gives us $3f$ , which is again, outside our interval, but we can divide it by 2 to bring it back in bound. Now we have our second note — it has frequency $(3/2)f$ .

Now we can keep going and try bringing $4f, 5f, 6f, ...$ into $[1f, 2f)$ , but this approach will give us many of the same notes. E.g. $4f$ will need to be divided by 4, and that gives us $1f$ again. $6f$ will also need to be divided by 4, giving us $(3/2)f$ again.

Let's try a different approach. From $1f$ , we went up to $(3/2)f$ . What if we wanted to play that interval, starting from $(3/2)f$ ? What if we kept going up by $3/2$ and dividing by powers of 2 to bring the note back inside $[1f, 2f)$ ? Once we return to $1f$ , we'll have exhausted all possible notes, right? Let's try that:

(3/2)^0 = 1  
(3/2)^1 = 1.5  
(3/2)^2 = 1.125    (after dividing by 2)  
(3/2)^3 = 1.6875   (after dividing by 2)  
(3/2)^4 = 1.2656   (after dividing by 4)  
(3/2)^5 = 1.8984   (after dividing by 4)  
(3/2)^6 = 1.4238   (after dividing by 8)  
(3/2)^7 = 1.0678   (after dividing by 16)

We've returned to a number pretty close to 1, but it doesn't seem like we'll ever return to 1 and complete the cycle. Let's verify this.

If it were possible, we'd have

{1}\cdot(3/2)^n(1/2)^k = 1

which simplifies to $3^n = 2^{n+k}$ . LHS is odd, RHS is even — we have a contradiction. It's not possible to return to our starting frequency.

We currently have 7 notes in between:

The points seem a bit sparse. Let's keep going:

(3/2)^0 = 1  
(3/2)^1 = 1.5  
(3/2)^2 = 1.125    (after dividing by 2)  
(3/2)^3 = 1.6875   (after dividing by 2)  
(3/2)^4 = 1.2656   (after dividing by 4)  
(3/2)^5 = 1.8984   (after dividing by 4)  
(3/2)^6 = 1.4238   (after dividing by 8)  
(3/2)^7 = 1.0678   (after dividing by 16)  
(3/2)^8 = 1.6018   (after dividing by 16)  
(3/2)^9 = 1.2013   (after dividing by 32)  
(3/2)^10 = 1.8020  (after dividing by 32)  
(3/2)^11 = 1.3515  (after dividing by 64)  
(3/2)^12 = 1.0136  (after dividing by 128)

After 12 iterations, we arrive at 1.0136 — much closer to the starting frequency.

This seems like a good point to stop — we now have pretty good coverage of the interval. If we kept going until 24, it would look like this:

This is no good. The new additions seem to be a repetition of the first 12 notes, just with a small shift to the right. This confirms that 12 is the optimal number to divide our interval into.

We assign letters to the first 7 notes: A, B, C, D, E, F, G. The 5 that come after are in between A $\to$ B, C $\to$ D, D $\to$ E, F $\to$ G, and G $\to$ A. To refer to those notes, we append the $\sharp$ symbol to the former of each pair, or $\flat$ to the latter.

Great. We now have ourselves a tuning system composed of 12 notes per unit. Koko is finally impressed.

Okay. Enough pretending to be smart monkeys. Serious talk from here on.

The monkeys we imagined ourselves to be are geniuses. The logic mirrors how humans really discovered these relationships — though it took centuries, not one afternoon by the campfire. The ancient Greeks, medieval monks, and Renaissance composers all grappled with these same mathematical puzzles.

We've built a tuning system from the ground up and discovered that it is optimal to divide an octave into 12. But the system we use today has 12 equal divisions. How did we get there?

let's explore tuning systems

The system we built is referred to as the Pythagorean tuning system^?. It was used in medieval times, when thirds were considered dissonant.

Other than the obvious flaw (can't return to 1 to close the circle), thirds sound horrible in this system. The ratio for thirds is supposed to be 5/4. But this tuning gives us a complex ratio: 81/64^?.

Is there a better way? Instead of only using fifths (3/2) as building blocks, what if we also used the fourth and the third, and combined them to construct intervals that aren't in the harmonic series?

E.g. let's start with C. We add the following lower-order harmonics to our system:

Perfect fifth: 3:2 → G
Perfect fourth: 4:3 → F
Major third: 5:4 → E

We could then combine these intervals by stacking them:

(major third) + (perfect fourth) = major sixth
- $(5/4)(4/3) = 5/3$ → A
(major third) + (perfect fifth) = major seventh
- $(5/4)(3/2) = 15/8$ → B
(perfect fifth) - (major third) = minor third
- $(3/2) / (5/4) = 6/5$ → E
(perfect fifth) + (perfect fifth) - octave = major second
- $(3/2)^2 (1/2) = 9/8$ → D
and so on

The system constructed in this manner has pure tones, but modulating becomes a nightmare — each key needs retuned ratios. The F# in A major is different from the F# in D major.

This system — Just Intonation — emerged during the Renaissance:

Shortly after, the meantone temperament came about. It's similar to the Pythagorean system: instead of sacrificing pure thirds for pure fifths, meantone sacrifices pure fifths for pure thirds:

In a fixed-pitch 12-note system, it is mathematically impossible to have both pure thirds and pure fifths. This is a prime factorization incompatibility.

Since we can't have everything, we gave up everything — the tuning system we use nowadays doesn't have a single pure ratio. Since we divided the octave into 12 for convenience, every note is slightly off-tune. Perfect pitch is really just perfect memory of imperfectly tuned notes.

This system is called 12TET, or 12 tone equal temperament.

12-TET vs Just Intonation

Click keys to hear the difference. The 12-TET keyboard shows deviations from pure Just Intonation ratios.

Just Intonation (Pure Ratios)

12-Tone Equal Temperament

In the 1800s, mass production of pianos demanded standardization. 12TET offered a reprieve from choosing between pure fifths, pure thirds, and modulatory freedom.

Musicians could modulate freely, piano makers could build one instrument for all repertoire, and the slight deviations from pure tones stayed below perceptual thresholds... for the most part:

This is why Jacob Collier's harmonies sound out of this world.

12TET is close enough to perfection. Math demanded compromise, culture chose convenience, network effects locked in the standard. If we demanded perfection, guitars would have squiggly frets.