57

I am trying to embed binary blobs into an exe file. I am using mingw gcc.

I make the object file like this:

ld -r -b binary -o binary.o input.txt

I then look objdump output to get the symbols:

objdump -x binary.o

And it gives symbols named:

_binary_input_txt_start
_binary_input_txt_end
_binary_input_txt_size

I then try and access them in my C program:

#include <stdlib.h>
#include <stdio.h>

extern char _binary_input_txt_start[];

int main (int argc, char *argv[])
{
    char *p;
    p = _binary_input_txt_start;

    return 0;
}

Then I compile like this:

gcc -o test.exe test.c binary.o

But I always get:

undefined reference to _binary_input_txt_start

Does anyone know what I am doing wrong?

4
  • 8
    By the way, I was unaware of this method of pulling arbitrary data into an executable - nice. Commented Apr 13, 2010 at 5:48
  • What does this method offer that's not offered by .rc files?
    – rubenvb
    Commented Oct 20, 2011 at 9:36
  • 1
    @rubenvb Easier access to contntent. It does not need calls to any Resource API:s
    – user877329
    Commented Mar 15, 2012 at 9:16
  • also github.com/graphitemaster/incbin
    – kervin
    Commented Feb 27, 2021 at 5:16

4 Answers 4

40

In your C program remove the leading underscore:

#include <stdlib.h>
#include <stdio.h>

extern char binary_input_txt_start[];

int main (int argc, char *argv[])
{
    char *p;
    p = binary_input_txt_start;

    return 0;
}

C compilers often (always?) seem to prepend an underscore to extern names. I'm not entirely sure why that is - I assume that there's some truth to this wikipedia article's claim that

It was common practice for C compilers to prepend a leading underscore to all external scope program identifiers to avert clashes with contributions from runtime language support

But it strikes me that if underscores were prepended to all externs, then you're not really partitioning the namespace very much. Anyway, that's a question for another day, and the fact is that the underscores do get added.

8
  • Wow... thanks alot. This was driving me mad. I knew it must have been something simple. I have just debugged it and noticed that it was changing to __binary_input_txt_start
    – myforwik
    Commented Apr 13, 2010 at 6:11
  • @myforwik: just in case you're interested, I've post a question asking why C does this: stackoverflow.com/questions/2627511/… Commented Apr 13, 2010 at 6:47
  • @Michael: The article's claim is true. The runtimes were written in assembler, which was free to use names without underscores prepended and could thereby be assured not to clash with any symbols defined in the C code, and conversely the C code had no way to access the symbols from the asm runtime code. Commented Aug 6, 2011 at 23:24
  • 1
    Does anyone know how much data that can be embedded that way?
    – user877329
    Commented Mar 15, 2012 at 9:18
  • 1
    @aditya: perhaps there's a difference in that detail that depends on the target? Windows toolchains have tendency to automatically add underscores to external names when targeting Win32 x86. I wouldn't be surprised if that doesn't happen for other targets (even Win32 x64). Commented Jan 9, 2014 at 18:24
9

From ld man page:

--leading-underscore

--no-leading-underscore

For most targets default symbol-prefix is an underscore and is defined in target's description. By this option it is possible to disable/enable the default underscore symbol-prefix.

so

ld -r -b binary -o binary.o input.txt --leading-underscore

should be solution.

6

I tested it in Linux (Ubuntu 10.10).

  1. Resouce file:
    input.txt

  2. gcc (Ubuntu/Linaro 4.4.4-14ubuntu5) 4.4.5 [generates ELF executable, for Linux]
    Generates symbol _binary__input_txt_start.
    Accepts symbol _binary__input_txt_start (with underline).

  3. i586-mingw32msvc-gcc (GCC) 4.2.1-sjlj (mingw32-2) [generates PE executable, for Windows]
    Generates symbol _binary__input_txt_start.
    Accepts symbol binary__input_txt_start (without underline).

1
  • Using tdm-gcc 4.8.1, I must refer to the variables using the underscore.
    – hauzer
    Commented Oct 3, 2013 at 1:01
0

Apparently this feature is not present in OSX's ld, so you have to do it totally differently with a custom gcc flag that they added, and you can't reference the data directly, but must do some runtime initialization to get the address.

So it might be more portable to make yourself an assembler source file which includes the binary at build time, a la this answer.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.