Re: Hash Table Collisions (n.runs-SA-2011.004)

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: Hash Table Collisions (n.runs-SA-2011.004)
From: Louis Mamakos <louie@...>
Date: Tue, 10 Jan 2012 16:43:17 -0500

On Jan 10, 2012, at 1:00 PM, Jay Carlson wrote:
> 
> You can cast anything to int and get *some* kind of result. This
> doesn't tell you much because pointers are surprisingly opaque in ANSI
> C, again to support machines with strange memory models. Here's a
> quote from the Sperry--uh I mean Unisys OS 2200 ANSI C compiler
> reference:
> 
> ====
> A pointer in UC cannot be treated as an integer. A UC pointer is a
> two-word structure with the base virtual address (VA) of a bank in the
> first word and a bit-word pointer in the second word. The bit-word
> pointer is necessary since the 2200 hardware does not have byte
> pointers; the basic pointer in the 2200 hardware is a word (36-bit) VA
> pointer that can only point to words. The bit-word portion of the UC
> pointer has a bit offset in the first 6 bits of the word and a word
> offset in the lower 24 bits of the word. If you convert (cast) a UC
> pointer to a 36-bit integer (int, long, or unsigned), the bit offset
> is lost. Converting it back to a C pointer results in it pointing to a
> word boundary. If you add 1 to the integer before converting it back
> to a pointer, the pointer points to the next word, not the next byte.
> A 36-bit integer is not capable of holding all the information in a UC
> pointer.
> ====
> 
> At a minimum you need 38 bits to address bytes--oh, did I mention
> they're 9-bit bytes? Anyway, bigger than a 36-bit int. Function
> pointers are eight words long and there simply is no integral type you
> can round trip them through.
> 
> If somebody has access to an OS 2200 box it might be fun to see if Lua
> works. I had an account on the Unix subsystem (!) of one a zillion
> years ago.
> 
> Jay

<Geezer mode engaged>

The character pointer representation was organized such that the compiler 
could easily do a Left Circular Shift to convert to a byte offset to compute
differences in pointers.  I think this resulted in the character offset within
a word being stored in the high order bits.  I has been more than 20 years,
so my memory could be faulty..

In 1995, I ported the 4.3BSD net2 release TCP/IP stack to this C compiler.
The BSD network code was pretty portable, not even written to strict
ANSI C standards at the time.  Now, the compiler does perform some heroic
work behind to scenes to make all this work, especially char and char* 
references given the word addressable nature of the machine and the
6, 9, 12 and 18 bit partial word accesses in the instruction set.

Oh, and the CPU architecture is 1's complement and not 2's complement -- this
was the source of one bug found and fixed (and in the SCCS change logs
after I fed it back to Mike Karels at Berkeley).  Some code in the UDP stack
did something like:

	udppkt->uh_sum = -1; 		/* set checksum field to all 1 bits */

which of course was wrong; the portable version is:

	udppkt->uh_sum = ~0;

CPUs that have both +0 and -0 integer values have a certain charm.  Lua, I
suppose, could also enjoy that given the underlying floating point representation
used in most instances that also have distinct representations available.

That CPU also give you a different sort of sense for "byte order".  Most
are used to big-endian and little-endian, where you have to translate from
the one-the-wire network representation to a host representation to do
arithmetic operations, etc.  On the 1100/2200 CPUs, typically the I/O systems
would take 8 bit bytes on network interfaces and drop them into the lower
8 bits of successive 9-bit quarterwords.   htonl() and ntohl() was more about
unpacking and packing the bits than swapping them around.  However, once you
got that definition right, the rest of the Berkeley network code in the stack
pretty much worked.  It helped that early on it was run on both big endian (Sun 68K)
and little endian (VAX, PDP-11) CPU architectures that encouraged portable
code development..

Louis Mamakos

References:
- Re: Hash Table Collisions (n.runs-SA-2011.004), Vladimir Protasov
- Re: Hash Table Collisions (n.runs-SA-2011.004), Miles Bader
- Re: Hash Table Collisions (n.runs-SA-2011.004), Ashwin Hirschi
- Re: Hash Table Collisions (n.runs-SA-2011.004), Miles Bader
- Re: Hash Table Collisions (n.runs-SA-2011.004), David Kolf
- Re: Hash Table Collisions (n.runs-SA-2011.004), Miles Bader
- Re: Hash Table Collisions (n.runs-SA-2011.004), Alexander Gladysh
- Re: Hash Table Collisions (n.runs-SA-2011.004), Roberto Ierusalimschy
- Re: Hash Table Collisions (n.runs-SA-2011.004), Roberto Ierusalimschy
- Re: Hash Table Collisions (n.runs-SA-2011.004), Leo Razoumov
- Re: Hash Table Collisions (n.runs-SA-2011.004), Roberto Ierusalimschy
- Re: Hash Table Collisions (n.runs-SA-2011.004), HyperHacker
- Re: Hash Table Collisions (n.runs-SA-2011.004), Jay Carlson
- Re: Hash Table Collisions (n.runs-SA-2011.004), HyperHacker
- Re: Hash Table Collisions (n.runs-SA-2011.004), Jay Carlson

Prev by Date: Re: Mutable strings (Was: Hash Table Collisions (n.runs-SA-2011.004)
Next by Date: Re: Mutable strings (Was: Hash Table Collisions (n.runs-SA-2011.004)
Previous by thread: Re: Hash Table Collisions (n.runs-SA-2011.004)
Next by thread: Re: Hash Table Collisions (n.runs-SA-2011.004)
Index(es):
- Date
- Thread