17

Why does this print the value of the memory address at 0x08480110? I'm not sure why there are 5 %08x arguments - where does that take you up the stack?

address = 0x08480110
address (encoded as 32 bit le string): "\x10\x01\x48\x08"
printf ("\x10\x01\x48\x08_%08x.%08x.%08x.%08x.%08x|%s|");

This example is taken from page 11 of this paper http://crypto.stanford.edu/cs155/papers/formatstring-1.2.pdf

2
  • Hello, the link you posted is down. Do you know by any chance if this paper is available somewhere else? I know the question has been answered, I'm just interested in the full examples.
    – Dimitroff
    Commented May 4, 2020 at 17:18
  • Is it this one? julianor.tripod.com/bc/formatstring-1.2.pdf
    – Dimitroff
    Commented May 4, 2020 at 17:42

5 Answers 5

24

I think that the paper provides its printf() examples in a somewhat confusing way because the examples use string literals for format strings, and those don't generally permit the type of vulnerability being described. The format string vulnerability as described here depends on the format string being provided by user input.

So the example:

printf ("\x10\x01\x48\x08_%08x.%08x.%08x.%08x.%08x|%s|");

Might better be presented as:

/* 
 * in a real program, some user input source would be copied 
 * into the `outstring` buffer 
 */
char outstring[80] = "\x10\x01\x48\x08_%08x.%08x.%08x.%08x.%08x|%s|";

printf(outstring);

Since the outstring array is an automatic, the compiler will likely put it on the stack. After copying the user input to the outstring array, it'll look like the following as 'words' on the stack (assuming little endian):

outstring[0c]               // etc...
outstring[08] 0x30252e78    // from "x.%0"
outstring[04] 0x3830255f    // from "_%08"
outstring[00] 0x08480110    // from the ""\x10\x01\x48\x08"

The compiler will put other items on the stack as it sees fit (other local variables, saved registers, whatever).

When the printf() call is about to be made, the stack might look like:

outstring[0c]               // etc...
outstring[08] 0x30252e78    // from "x.%0"
outstring[04] 0x3830255f    // from "_%08"
outstring[00] 0x08480110    // from the ""\x10\x01\x48\x08"
var1
var2
saved ECX
saved EDI

Note that I'm completely making those entries up - each compiler will use the stack in different ways (so a format string vulnerability has to be custom crafted for a particular exact scenario. In other words, you won't always use 5 dummy format specifiers like in this example - as the attacker you'd need to figure out how many dummies the particular vulnerability would need.

Now to call printf(), the argument (the address of outstring) is pushed on to the stack and printf() is called, so the argument area of the stack looks like:

outstring[0c]               // etc...
outstring[08] 0x30252e78    // from "x.%0"
outstring[04] 0x3830255f    // from "_%08"
outstring[00] 0x08480110    // from the ""\x10\x01\x48\x08"
var1
var2
var3
saved ECX
saved EDI
&outstring   // the one real argument to `printf()`

However, printf doesn't really know anything about how many arguments have been placed on the stack for it - it goes by the format specifiers it finds in the format string (the one argument it's 'sure' to get). So printf() gets the format string argument and starts processing it. When it gets to the 1st "%08x" that will correspond to the 'saved EDI' in my example, then next "%08x" will print the saved ECX' and so on. So the "%08x" format specifiers are just eating up data on the stack until it gets back to the string the attacker was able to input. Determining how many of those are needed is something an attacker would do by a kind of trial and error (probably by a test run that has a whole slew of "%08x" formats until he can 'see' where the format string starts).

Anyway, when printf() gets to processing the "%s" format specifier, it has consumed all the stack entries up to where the outstring buffer resides. The "%s" specifier treats its stack entry as a pointer, and the string that the user has put into that buffer has been carefully crafted to have a binary representation of 0x08480110, so printf() will print out whatever is at that address as an ASCIIZ string.

1
  • i tried to recreate this in code. But I found format string and arguments to prontf() were passed to printf() not on the stack but in CPU registers. i just tried to pass 2 format strings and 2 integers, and they all were passed in registers. So i guess it depends on compiler and architecture if this technique can be exploited for stack attack Commented Nov 24, 2023 at 22:24
9

You have 6 format specifiers (5 lots of %08x and one of %s), but you do not provide values for those format specifiers. You immediately fall into the realm of undefined behaviour - anything could happen and there is no wrong answer.

However, in the normal course of events, the values passed to printf() would have been stored on the stack, so the code in printf() reads values off the stack as if the extra values had been passed. The function return address is on the stack, too. There is no guarantee that I can see that the value 0x08480110 will actually be produced. This sort of attack very much depends on the the specific program and faulty function call, and you might well get a very different value. The example code is most likely written assuming a 32-bit Intel (little-endian) CPU - rather than a 64-bit or big-endian CPU.


Adapting the code fragment, compiling it into a complete program, ignoring the compilation warnings, using a 32-bit compilation on MacOS X 10.6.7 with GCC 4.2.1 (XCode 3), the following code:

#include <stdio.h>

static void somefunc(void)
{
    printf("AAAAAAAAAAAAAAAA.0x%08X.0x%08X.0x%08X.0x%08X.0x%08X.0x%08X.0x%08X.|%s|\n");
}

int main(void)
{
    char buffer[160] =
        "abcdefghijklmnopqrstuvwxyz012345"
        "abcdefghijklmnopqrstuvwxyz012345"
        "abcdefghijklmnopqrstuvwxyz012345"
        "abcdefghijklmnopqrstuvwxyz012345"
        "abcdefghijklmnopqrstuvwxyz01234";
    somefunc();
    return 0;
}

produces the following result:

 AAAAAAAAAAAAAAAA.0x000000A0.0xBFFFF11C.0x00001EC4.0x00000000.0x00001E22.0xBFFFF1C8.0x00001E5A.|abcdefghijklmnopqrstuvwxyz012345abcdefghijklmnopqrstuvwxyz012345abcdefghijklmnopqrstuvwxyz012345abcdefghijklmnopqrstuvwxyz012345abcdefghijklmnopqrstuvwxyz01234|

As you can see, I eventually 'found' the string in the main program from the printf() statement. When I compiled it in 64-bit mode, I got a core dump instead. Both results are perfectly correct; the program invokes undefined behaviour, so anything the program does is valid. If you're curious, search for 'nasal demons' for more information on undefined behaviour.

And get used to experimenting with these sorts of issues.


Another variation

#include <stdio.h>

static void somefunc(void)
{
    char format[] =
        "AAAAAAAAAAAAAAAA.0x%08X.0x%08X.0x%08X.0x%08X.0x%08X\n"
        ".0x%08X.0x%08X.0x%08X.0x%08X.0x%08X.0x%08X.0x%08X\n"
        ".0x%08X.0x%08X.0x%08X.0x%08X.0x%08X.0x%08X.0x%08X\n";
    printf(format, 1);
}

int main(void)
{
    char buffer[160] =
        "abcdefghijklmnopqrstuvwxyz012345"
        "abcdefghijklmnopqrstuvwxyz012345"
        "abcdefghijklmnopqrstuvwxyz012345"
        "abcdefghijklmnopqrstuvwxyz012345"
        "abcdefghijklmnopqrstuvwxyz01234";
    somefunc();
    return 0;
}

This produces:

AAAAAAAAAAAAAAAA.0x00000001.0x00000099.0x8FE467B4.0x41000024.0x41414141
.0x41414141.0x41414141.0x2E414141.0x30257830.0x302E5838.0x38302578.0x78302E58
.0x58383025.0x2578302E.0x2E583830.0x30257830.0x2E0A5838.0x30257830.0x302E5838

You might recognize the format string in the hex output - 0x41 is capital A, for example.

The 64-bit output from that code is both similar and different:

AAAAAAAAAAAAAAAA.0x00000001.0x00000000.0x00000000.0xFFE0082C.0x00000000
.0x41414141.0x41414141.0x2578302E.0x30257830.0x38302578.0x58383025.0x0A583830
.0x2E583830.0x302E5838.0x78302E58.0x2578302E.0x30257830.0x38302578.0x38302578
2
  • Thank you, I understand this, but the paper then claims that this code "Will dump memory from 0x08480110 until a NUL byte is reached." why is is that putting the memory address at the beginning of the format string allows you to print the contents at that address when you get to the %s? Commented Apr 15, 2011 at 6:25
  • @Vikas: the article was written in 2001, when life was simpler. Things have changed since then. It notes that the format string has to be on the stack, and then you can get at its address more easily. Modern compilers would not place the format string on the stack; it would most likely be in the text segment (program code; readonly memory). Commented Apr 15, 2011 at 6:33
1

You misunderstood the paper.

The text you linked is assuming that the current position on the stack is 0x08480110 (look at the surrounding text). The printf() will dump data from wherever on the stack you happen to be.

The \x10\x01\x48\x08 at the beginning of the format string is merely to print the (assumed) address to stdout in front of the dumped data. In no way do these numbers modify the address from which the data is dumped.

0

You're correct about "take you up the stack", but only barely; it relies on the assumption that arguments are passed on the stack, rather than in registers. (Which, for a variadic function is probably a safe assumption, but still an assumption about implementation details.)

Each %08x asks for the 'next unsigned int argument' to be printed in hex; what actually occurs in that 'next argument' location is both architecture and compiler dependent. If you compare the values you get with /proc/self/maps for the process, you might be able to narrow down what some of the numbers mean.

0

A little theory

If you want to see actual trick to write at custom address jump to second part.

Lets try tweaking format string in printf() trick.

printf("ABABABAB");

But encoding a HEX address into a format string directly was not working. WHole point is masquerading some address which would be exploited for attack into stack, but my format string "ABABABAB" ended in .rodata section and nor in Stack as we wanted to.

Breakpoint 1, __printf (format=0x555555556004 "ABABABAB") at ./stdio-common/printf.c:28
(gdb) i args
format = 0x555555556004 "ABABABAB"

When this address is looked for in process memory map it is probably .rodata section:

      Start Addr           End Addr       Size     Offset  Perms  objfile
  0x555555554000     0x555555555000     0x1000        0x0  r--p   /home/drazen/proba/main
  0x555555555000     0x555555556000     0x1000     0x1000  r-xp   /home/drazen/proba/main
  0x555555556000     0x555555557000     0x1000     0x2000  r--p   /home/drazen/proba/main
  0x555555557000     0x555555558000     0x1000     0x2000  r--p   /home/drazen/proba/main
  0x555555558000     0x555555559000     0x1000     0x3000  rw-p   /home/drazen/proba/main

and check with readelf:

drazen@HP-ProBook-640G1:~/proba$ readelf  -p .rodata  main 
String dump of section '.rodata':
  [     4]  ABABABAB

So far OK, but weird part is when I dumped stack and expected to find ABABABAB string address in stack frame as argument passed to printf().

(gdb) i frame
Stack level 0, frame at 0x7fffffffddf0:
rip = 0x7ffff7de16f0 in __printf (./stdio-common/printf.c:28); saved rip = 0x555555555165
called by frame at 0x7fffffffde00
source language c.
Arglist at 0x7fffffffdde0, args: format=0x555555556004 "ABABABAB"

you can see return address to main() 0x555555555165, and expect to find format string address on stack at address 0x7fffffffdde0 But when we dump stack instead of format string address there is just 8 bytes of zeros where function argument should be, between __libc_start_call_main() stack frame return address and printf() stack frame return address:

(gdb) x/32gx $sp
0x7fffffffdde0: 0x0000000000000000  0x0000555555555165
0x7fffffffddf0: 0x0000000000000001  0x00007ffff7daad90
0x7fffffffde00: 0x0000000000000000  0x0000555555555149
0x7fffffffde10: 0x0000000100000000  0x00007fffffffdf08

So how is address of format string passed to prIntf()? When we dumped registers we saw format string address in rsi register.

(gdb) i r
rax            0x7ffff7f9b868      140737353726056
rbx            0x0                 0
rcx            0x0                 0
rdx            0x7fffffffdcf0      140737488346352
rsi            0x555555556004      93824992239624
rdi            0x7ffff7f9b780      140737353725824

Because function arguments (string address in this case) will be passed in rsi and rdi registers for purpose of speed and not in the stack we cant use format string and string arguments for this trick.

So we can just use strings created as local (automatic) variables to be put in stack, before return address in current stack frame.


Actual example

Anyway I tried this small example and it worked, printed out addresses put in local strings (created on stack). So we could use this trick to make local strings mimic addresses we want to access:

Sample code

We have to print 5 random values until we reached what we wanted, our local strings!

Using hexadecimal format %x showed HEX representation of strings avro, nana, loli on stack (using %s string format would cause segmentation fault because printf() would interpret those values as addresses of strings but those "addresses" are probably not in mapped area of the process or are in protected memory area):

Output

So now we used local variables on stack to "masquerade" as data access. But what if we can use this to try to write on that address?

Lets change last %X format specifier to %n. Instead of printing content of data on stack with %X, we will use this data as address of variable where printf() stores number of characters already printed. So idea is to gain write access to custom address.

printf("ABABABAB\n,%016llX\n,%016llX\n,%016llX\n,%016llX\n,%016llX\n,%016llX\n,%016llX\n,%n");

Our FAKE address 0x61616161616161 represented as ASCII "aaaaaaa" ends in %rax register, and printf will write at this address number of characters already printed (stored in r12):

(gdb) i r
rax            0x61616161616161    27410143614427489
rbx            0x555555556052      93824992239698


      0x00007ffff7df7c3c <+7180>:   jne    0x7ffff7df8276 <__vfprintf_internal+8774>
   => 0x00007ffff7df7c42 <+7186>:   mov    %r12d,(%rax)

But in our case this will use SEGV segmentation fault since address 0x61616161616161 is not mapped into process memory.

Continuing.
ABABABAB
,00007FFFFFFFDF08
,00007FFFFFFFDF18
,0000555555557DB8
,00007FFFF7F9BF10
,00007FFFF7FC9040
,0031313131313131
,0032323232323232
Program received signal SIGSEGV, Segmentation fault.

I hope this helps!

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.