lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


It was thus said that the Great Soni They/Them L. once stated:
> On 2018-02-25 10:58 PM, Sean Conner wrote:
> >It was thus said that the Great Soni They/Them L. once stated:
> >>On 2018-02-25 04:51 PM, Sean Conner wrote:
> >>
> >>Are there cases of static linking where you can't just change the symbol
> >>names?
> >   Well, you can change the name in the source code (more portable, 
> >   probably
> >the "easiest" way).  Changing the name after you have object code is ... I
> >don't know.  If you can, it's most likely very system dependent.
> >
> 
> I assume static linking relies on dlopen(NULL, ...)

  Static linking has *nothing* to do with dlopen().  That's an entirely
different process.  Okay ...

  Static linking.  I'll ignore dynamic linking for this example. You have
two C files:

	---[ c1.c ]---

	extern int C();

	int A()       { ... }
	int B( C(); ) { ... }

	---[ c2.c ]---

	extern int A();
	extern int B();

	int C( A(); A(); ) { ... }
	int main()         { ... }

  You compile each separately and end up with object files.  Each object
file has something like this:

	---[ c1.o ]---

	DEFINED
		A	100
		B	200

	UNDEFINED
		C	210

	---[ c2.o ]---

	DEFINED
		C	100
		main	200

	UNDEFINED
		A	110
		A	130

  For the DEFINED sections, this list contains the name, and the offset into
the program portion it defines.  So we have function A() lives at offset 100
in the program portion of c1.o, B() lives at offset 200 in the program
portion.  Note how c2.o has different names but the same offsets (they
aren't addresses *yet*).  

  The UNDEFINED section lists symbols not defined in the source but are
referenced.  For c1.o, we have function C().  The offset here is where we
need to store the address of C() when linking.  In c2.0, we call A() twice,
and thus, need the address twice, at offsets 110 and 130 (in c2.o).

  A *linker* will take these two object files and stitch (or *HINT HINT*
LINK) them together.  During the linking phase, the offsets become actual
addresses and everything is patched up.  So we might end up with the
following layout:

	address		comment
	500		start of function A
	700		start of function B
	710		address of function C stored here as part of CALL instruction
	1000		start of function C
	1010		address of function A stored here as part of CALL instruction
	1030		address of function A stored here as part of CALL instruction
	1200		address of main

  Everything is defined, nothing needs to be loaded, and the program runs.  

  Okay, dynamic linking, part I.  We'll modify things up a bit in the C
code:

	---[ c1.c ]---

	extern int printf();
	extern int C();

	int A( printf(); ) { ... }
	int B( C(); )      { ... }

	---[ c2.c ]---

	extern int getpid();
	extern int A();
	extern int B();

	int C( A(); A(); )         { ... }
	int main( getpid(); C(); ) { ... }

Compilation to object files we get:

	---[ c1.o ]---

	DEFINED
		A	100
		B	200

	UNDEFINED
		printf	110
		C	210

	---[ c2.o ]---

	DEFINED
		C	100
		main	200

	UNDEFINED
		getpid	110
		A	120
		A	150

  The link phase happens.  When the linker goes through the libc library, it
notices that it's a dynamic link library and instead of including the object
code into the final executable, it just places references with some
addtional information in the executable that looks a lot like what we had in
object files:

	DYNAMIC_LIB
		libc		We need this library at run time

	UNDEFINED
		printf	510	need address of printf stored here as part of CALL instruction
		getpid	1010	need address of getpid stored here as part of CALL instruction

  At *run time*, the kernel will locate the proper location of the library
libc, and then patch up the executable with the locations of the required
functions.  User code *does not call dlopen* for this.  This happens behind
the scenes (and yes, I've simplified this quite a bit but this is the gist
of what happens).

  Okay, dynamic linking, part II.  This is when the usercode calls dlopen(). 
dlopen() will locate the proper library and ensure it's in memory.  The code
then calls dlsym() to return the locations of symbols out of the loaded
dynamic library.  This is what Lua uses to load Lua modules written in C.

> /the global symbol 
> table. This means the symbol must be exported anyway, which does make it 
> trivial to rename it, as far as I know.

  How?  The name comes from the source code and is embedded into the object
file (and possibly the executable if it's part of a dynamic library loaded
at runtime).  

> I assume all other cases would have to be directly embedded in the Lua 
> source somehow. (I believe the procedure for this is described 
> somewhere, something to do with modifying luaL_openlibs?)

  Nope.  I describe one way here:

	http://boston.conman.org/2013/03/23.1

which involves adding some new entries to the package.searchers array (I
update package.loaders but that's because I'm using Lua 5.1 in that
particular project).  Another way when statically linking in C based Lua
modules is to populate the luaopen_() calls in the package.preload array.

> Interesting. So C modules could be renameable if they supported what I 
> proposed.

  Again, how?

  -spc