Fishy Words


Buffer Overflows

A buffer overflow is a type of vulnerability common to “memory unsafe” languages like C.

This post explains the exploit at a few levels of context:

No CS background

Imagine the CPU is a person running a todo list. Each step in the program is an line in the checklist:

1. Store 4 in X
2. Multiply X by 5
3. Print X

This program will print 20

When a program needs to receive user-input, it allocates a buffer for the text:

1. Ask user for name (limit 10 letters) # safe
name: _ _ _ _ _ _ _ _ _ _
2. Print "hello "
3. Print name

When the program runs step 1 and asks the user for the name, the user is prompted to input some text, which is stored at the name memory location. There are 10 letter locations allocated for the name, and the user input is constrained to 10 letters at input.

A buffer-overflow vulnerability is opened when the programmer sets an input constraint larger than the allocation space, allowing the user input to overflow onto the instructions:

1. Ask user for name (limit 40 letters) # vulnerable
name: _ _ _ _ _ _ _ _ _ _
2. Print "hello "
3. Print name

Now the user can insert a specific message as their name: aaaaaaaaaa2. Print "hacked"

And the input overflows onto the instructions:

1. Ask user for name (40 letters) 
name: a a a a a a a a a a
2. Print "hacked"
3. Print name

More CS

In practice, this is more complicated because when the OS loads a program loads into RAM, it has one address space for the code, and another separate area for something called the stack.

The stack is the current bread-trail of the computer, storing what functions have called eachother up to this point. Imagine function A calls function B and C. When the program runs function A, it will call into B. The OS needs to remember where it is in function A and where it is in function B. When function B ends, the OS removes the “function b” stack frame and goes back to where it was in function A.

The stack also holds temporary variables allocated for the function. As long as you know the size ahead of time. Lazy programmers might assume that no-one would enter more than 20 characters for a name, and stack-allocate the name array.

The attacker can do the same sort of overflow, but instead of overwriting “code” like in the example above, they also add in the RAM address of the buffer they’re overflowing into. They want to overflow onto the part of the stack that tells the OS where to go after completing function B, and have it jump into the assembly code they’ve inserted into the name buffer.

It’s a complicated attack that requires understanding and breaking down multiple layers of abstractions involved when running a program.