lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On 12/2/2017 8:44 AM, Meepen wrote:
I'm not sure what method x86 kernel mode uses to update the
virtual to physical lookup in the mmu, but id assume it uses
internal data structure swaps of some sort instead of functions
translating addresses because of speed

Virtual to physical lookup is cached by the TLB [1]. It's just a kind of on-demand lookup, with cached results. The page tables themselves would be managed by the OS kernel when setting up processes etc. During operation: (1) No penalty for untouched memory pages, nothing untouched needs to be TLB cached. (2) TLB hit if memory location is in the cache, fast path/lookup. (3) Read page tables and update cache on a TLB lookup miss, slow path/lookup.

[1] https://en.wikipedia.org/wiki/Translation_lookaside_buffer

TLB specs on a somewhat recent processor architecture (AMD Jaguar) [2]. It tries to reduce miss latencies via speculation and side caches. But if we code a single-threaded software app, there are no direct equivalents for hardware speedups which run in parallel.

[2] https://www.realworldtech.com/jaguar/6/

TLB misses are quite low, chip architects have put much more effort (and transistor budgets) on branch prediction and DCache/ICache management. Plus modern OS kernels use large pages, this reduces contention and minimizes penalty of kernel mode switches. Also modern OS on modern CPUs do not flush everything on user mode process switches these days. So most kinds of optimizations (all the low hanging fruit) have been implemented.

I checked an Intel optimization manual, there is only one entry for TLB: TLB priming. Make a memory read for an upcoming page in advance so that the TLB is updated early. This gives the CPU opportunity to hide a TLB miss. But one may well see gains only with processing data that have predictable read/write characteristics.


On Dec 1, 2017 6:49 PM, "Soni "They/Them" L." wrote:



    On 2017-12-01 01:45 AM, Meepen wrote:

        Could you at all update metatables on coroutine switch?
        it'd be faster if you used the metatables at all but would
        slow down minimally otherwise


    Context switches instead of namespacing?

    Hmm... Maybe. It wouldn't be easy, because with context
    switches, any mistake leaks the wrong context. With
    namespacing, that problem is pretty much non-existent.

    I mean, what do modern CPUs and kernels use?

        On Nov 30, 2017 6:17 PM, "Soni "They/Them" L."
        <fakedme@gmail.com <mailto:fakedme@gmail.com>
        <mailto:fakedme@gmail.com <mailto:fakedme@gmail.com>>> wrote:

            Hi!

            I have some code that looks like this:

            debug.setmetatable("", {__index=function(o,k)
              local mt = metatables[coroutine.running()].string
              if mt then
                local __index = rawget(mt, "__index")
                if type(__index) == "function" then
                  return __index(o,k)
                else
                  return __index[k]
                end
              end
              error()
            end})

            It takes the metatables for the basic Lua types
        (string, number,
            nil, etc) and replaces them with a proxy metatable.
        This proxy
            metatable forwards to a different table based on the
        currently
            running coroutine. This gives me virtualization of
        those metatables.

            It's really slow (3-4x slower[1] than the default string
            metatable) and I'd like to make it faster. Is that
        possible?

            [1] - I haven't actually benchmarked it, but default
        string
            metatable gives about 2 table accesses per operation;
        this thing
            does at least 8 when using globals, and that doesn't
        take into
            account interpreter overhead and all the function calls!
[snip]


--
Cheers,
Kein-Hong Man (esq.)
Selangor, Malaysia