2

I was looking to the symbols in the libc.a file and I noticed there some "ABS" symbols.

For example, there is the "_nl_current_LC_COLLATE_used" symbol.

Here is the output of readelf on the libc.a file.

The symbols:

File: libc.a(setlocale.o)

Symbol table '.symtab' contains 77 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
   ...
    39: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND _nl_current_LC_COLLATE_used
   ...


File: libc.a(uselocale.o)

Symbol table '.symtab' contains 34 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
   ...
     6: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND _nl_current_LC_COLLATE_used
   ...
   

File: libc.a(lc-collate.o)

Symbol table '.symtab' contains 5 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
   ...
     1: 0000000000000002     0 NOTYPE  GLOBAL DEFAULT  ABS _nl_current_LC_COLLATE_used
   ...

The relocations:


File: libc.a(setlocale.o)

Relocation section '.rela.text' at offset 0x1b98 contains 124 entries:
    Offset             Info             Type               Symbol's Value  Symbol's Name + Addend
...
00000000000009a3  0000002700000009 R_X86_64_GOTPCREL      0000000000000000 _nl_current_LC_COLLATE_used - 5
...


Relocation section '.rela.data.rel.ro' at offset 0x2738 contains 13 entries:
    Offset             Info             Type               Symbol's Value  Symbol's Name + Addend
...
0000000000000098  0000002700000001 R_X86_64_64            0000000000000000 _nl_current_LC_COLLATE_used + 0
...


File: libc.a(uselocale.o)

Relocation section '.rela.text' at offset 0x838 contains 29 entries:
    Offset             Info             Type               Symbol's Value  Symbol's Name + Addend
...
0000000000000029  0000000600000009 R_X86_64_GOTPCREL      0000000000000000 _nl_current_LC_COLLATE_used - 5
...


So, the "_nl_current_LC_COLLATE_used" symbol is the target of:

  • two R_X86_64_GOTPCREL relocations

  • one R_X86_64_64 relocation

If I understand correctly, that means the adress of the "_nl_current_LC_COLLATE_used" symbol is needed so this symbol must be defined somewhere in the process memory.

But, this symbol is present in 3 files:

  • two times as a "WEAK UNDEFINED" symbol, that's ok, a definition should be found later by the linker

  • one time as a "ABS" symbol with a "2" as value

There is no other definition anywhere else, so when I compile a simple helloworld.c file and link it against the libc.a, a definition of "_nl_current_LC_COLLATE_used" must be used to produce a final binary, right?

But there is no section associated to this symbol, only the value "2".

According to the ELF specification :

SHN_ABS value specifies absolute values for the corresponding reference. This means if a symbol references this section then its already has an absolute value and aren't affected by relocation.

So, what ? "2" is the absolute address of this symbol ? I don't think so

By looking in the source code of the libc, the "_nl_current_LC_COLLATE_used" is defined (via a macro and using an asm directive) as a const integer value = 2

What I don't understand:

  • in the final binary, where is the "2" value stored ? Is it up to the linker to create a new data section to store all the absolute symbols values ?

  • if so, how the linker know which size to use for each absolute symbol value ? Here, by looking to the source code, it seems that the value "2" should be stored as a 64 bit integer, but the "size" field of the .symtab is set to 0

  • how the linker know if the symbol should be stored in a R or a RW data section ?

1 Answer 1

1

Let's go through your questions.

If I understand correctly, that means the adress of the "_nl_current_LC_COLLATE_used" symbol is needed so this symbol must be defined somewhere in the process memory.

Address of the symbol is needed indeed; however this does not necessarily means the symbol must be placed in any section. Usual way for a symbol to get an address is to be placed in some section and for the linker to generate the address based on that placement; however there are other ways for a symbol to get an address and ABS type symbol is a perfectly valid way of doing that (it is used e.g. with embedded code to place MMIO variables and such stuff that needs to be at a specific place in address space).

In this particular case _nl_current_LC_COLLATE_used is "placed" at address 2; this is a bogus address of course, there's nothing there! But the glibc code never accesses this address so it's fine (I mean as far as C goes it is probably all sorts of UB but well, it's glibc); only usage is glibc taking the address of the symbol, as in &_nl_current_LC_COLLATE_used != NULL. The code is only interested in whether linker used weak definition (which would be symbol placed at 0), or absolute definition (which would be symbol placed at 2).

So, what ? "2" is the absolute address of this symbol ?

Yes it is.

how the linker know if the symbol should be stored in a R or a RW data section ?

Absolute symbols don't get placed in any sections.

if so, how the linker know which size to use for each absolute symbol value ? Here, by looking to the source code, it seems that the value "2" should be stored as a 64 bit integer, but the "size" field of the .symtab is set to 0

Symbol size is 0. Size of the address of that symbol is 8.

in the final binary, where is the "2" value stored ? Is it up to the linker to create a new data section to store all the absolute symbols values ?

Yes with a bit of a caveat. The linker will create new section and store addresses there if necessary (not just absolute though!). However it might not have to.

For R_X86_64_64 the linker uses constant 2 as a relocation; the value stored direcly in the code being relocated, no extra storage necessary.

For R_X86_64_GOTPCREL and such the linker can sometimes perform a relaxation and end up with same 2 constant as in R_X86_64_64 case. E.g. in my toy test object file was:

   0:   f3 0f 1e fa             endbr64
   4:   8b 05 00 00 00 00       mov    0x0(%rip),%eax
                        6: R_X86_64_GOTPCRELX   _nl_current_LC_COLLATE_used-0x4
   a:   c3                      ret

Note RIP-relative addressing and usage of GOT; the compiler uses indirect access to the address of symbol; so the address (2) should be going into GOT, and address of that address is a relocation. However linker may sometimes figure out value of the address being loaded and optimize this load. In the final linked executable:

  401000:       f3 0f 1e fa             endbr64
  401004:       c7 c0 02 00 00 00       mov    $0x2,%eax
  40100a:       c3                      ret

The linker figured out value being loaded by that GOT access would always be 2 so it completely replaced indirect load with a simple constant load.

Now, if we disable relaxation or if linker may not perform it because it can't prove the symbol address is going to be constant no matter what, we get:

  401000:       f3 0f 1e fa             endbr64
  401004:       8b 05 d6 2f 00 00       mov    0x2fd6(%rip),%eax        # 403fe0
  40100a:       c3                      ret

Here, we see indirect load stayed as is, 2 is being loaded from the memory. So where it is and how it got where?

   RELRO off    0x0000000000002fe0 vaddr 0x0000000000403fe0 paddr 0x0000000000403fe0 align 2**0
         filesz 0x0000000000000020 memsz 0x0000000000000020 flags r--

That's the RO data segment it got placed into.

  3 .got          00000008  0000000000403fe0  0000000000403fe0  00002fe0  2**3
                  CONTENTS, ALLOC, LOAD, DATA

And the section. This section has been created by the linker itself to store the address 2 (and any other addresses referred via GOT-relative relocations). And its contents is the value 2:

Contents of section .got:
 403fe0 02000000 00000000  
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks, very detailed and usefull answer !

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.