It was thus said that the Great Soni They/Them L. once stated:
On 2018-02-25 10:58 PM, Sean Conner wrote:
It was thus said that the Great Soni They/Them L. once stated:
On 2018-02-25 04:51 PM, Sean Conner wrote:
Are there cases of static linking where you can't just change the symbol
names?
Well, you can change the name in the source code (more portable,
probably
the "easiest" way). Changing the name after you have object code is ... I
don't know. If you can, it's most likely very system dependent.
I assume static linking relies on dlopen(NULL, ...)
Static linking has *nothing* to do with dlopen(). That's an entirely
different process. Okay ...
Static linking. I'll ignore dynamic linking for this example. You have
two C files:
---[ c1.c ]---
extern int C();
int A() { ... }
int B( C(); ) { ... }
---[ c2.c ]---
extern int A();
extern int B();
int C( A(); A(); ) { ... }
int main() { ... }
You compile each separately and end up with object files. Each object
file has something like this:
---[ c1.o ]---
DEFINED
A 100
B 200
UNDEFINED
C 210
---[ c2.o ]---
DEFINED
C 100
main 200
UNDEFINED
A 110
A 130
For the DEFINED sections, this list contains the name, and the offset into
the program portion it defines. So we have function A() lives at offset 100
in the program portion of c1.o, B() lives at offset 200 in the program
portion. Note how c2.o has different names but the same offsets (they
aren't addresses *yet*).
The UNDEFINED section lists symbols not defined in the source but are
referenced. For c1.o, we have function C(). The offset here is where we
need to store the address of C() when linking. In c2.0, we call A() twice,
and thus, need the address twice, at offsets 110 and 130 (in c2.o).
A *linker* will take these two object files and stitch (or *HINT HINT*
LINK) them together. During the linking phase, the offsets become actual
addresses and everything is patched up. So we might end up with the
following layout:
address comment
500 start of function A
700 start of function B
710 address of function C stored here as part of CALL instruction
1000 start of function C
1010 address of function A stored here as part of CALL instruction
1030 address of function A stored here as part of CALL instruction
1200 address of main
Everything is defined, nothing needs to be loaded, and the program runs.
Okay, dynamic linking, part I. We'll modify things up a bit in the C
code:
---[ c1.c ]---
extern int printf();
extern int C();
int A( printf(); ) { ... }
int B( C(); ) { ... }
---[ c2.c ]---
extern int getpid();
extern int A();
extern int B();
int C( A(); A(); ) { ... }
int main( getpid(); C(); ) { ... }
Compilation to object files we get:
---[ c1.o ]---
DEFINED
A 100
B 200
UNDEFINED
printf 110
C 210
---[ c2.o ]---
DEFINED
C 100
main 200
UNDEFINED
getpid 110
A 120
A 150
The link phase happens. When the linker goes through the libc library, it
notices that it's a dynamic link library and instead of including the object
code into the final executable, it just places references with some
addtional information in the executable that looks a lot like what we had in
object files:
DYNAMIC_LIB
libc We need this library at run time
UNDEFINED
printf 510 need address of printf stored here as part of CALL instruction
getpid 1010 need address of getpid stored here as part of CALL instruction
At *run time*, the kernel will locate the proper location of the library
libc, and then patch up the executable with the locations of the required
functions. User code *does not call dlopen* for this. This happens behind
the scenes (and yes, I've simplified this quite a bit but this is the gist
of what happens).
Okay, dynamic linking, part II. This is when the usercode calls dlopen().
dlopen() will locate the proper library and ensure it's in memory. The code
then calls dlsym() to return the locations of symbols out of the loaded
dynamic library. This is what Lua uses to load Lua modules written in C.
/the global symbol
table. This means the symbol must be exported anyway, which does make it
trivial to rename it, as far as I know.
How? The name comes from the source code and is embedded into the object
file (and possibly the executable if it's part of a dynamic library loaded
at runtime).