|
On 12/2/2017 8:44 AM, Meepen wrote:
I'm not sure what method x86 kernel mode uses to update the virtual to physical lookup in the mmu, but id assume it uses internal data structure swaps of some sort instead of functions translating addresses because of speed
Virtual to physical lookup is cached by the TLB [1]. It's just a kind of on-demand lookup, with cached results. The page tables themselves would be managed by the OS kernel when setting up processes etc. During operation: (1) No penalty for untouched memory pages, nothing untouched needs to be TLB cached. (2) TLB hit if memory location is in the cache, fast path/lookup. (3) Read page tables and update cache on a TLB lookup miss, slow path/lookup.
[1] https://en.wikipedia.org/wiki/Translation_lookaside_bufferTLB specs on a somewhat recent processor architecture (AMD Jaguar) [2]. It tries to reduce miss latencies via speculation and side caches. But if we code a single-threaded software app, there are no direct equivalents for hardware speedups which run in parallel.
[2] https://www.realworldtech.com/jaguar/6/TLB misses are quite low, chip architects have put much more effort (and transistor budgets) on branch prediction and DCache/ICache management. Plus modern OS kernels use large pages, this reduces contention and minimizes penalty of kernel mode switches. Also modern OS on modern CPUs do not flush everything on user mode process switches these days. So most kinds of optimizations (all the low hanging fruit) have been implemented.
I checked an Intel optimization manual, there is only one entry for TLB: TLB priming. Make a memory read for an upcoming page in advance so that the TLB is updated early. This gives the CPU opportunity to hide a TLB miss. But one may well see gains only with processing data that have predictable read/write characteristics.
On Dec 1, 2017 6:49 PM, "Soni "They/Them" L." wrote: On 2017-12-01 01:45 AM, Meepen wrote: Could you at all update metatables on coroutine switch? it'd be faster if you used the metatables at all but would slow down minimally otherwise Context switches instead of namespacing? Hmm... Maybe. It wouldn't be easy, because with context switches, any mistake leaks the wrong context. With namespacing, that problem is pretty much non-existent. I mean, what do modern CPUs and kernels use? On Nov 30, 2017 6:17 PM, "Soni "They/Them" L." <fakedme@gmail.com <mailto:fakedme@gmail.com> <mailto:fakedme@gmail.com <mailto:fakedme@gmail.com>>> wrote: Hi! I have some code that looks like this: debug.setmetatable("", {__index=function(o,k) local mt = metatables[coroutine.running()].string if mt then local __index = rawget(mt, "__index") if type(__index) == "function" then return __index(o,k) else return __index[k] end end error() end}) It takes the metatables for the basic Lua types (string, number, nil, etc) and replaces them with a proxy metatable. This proxy metatable forwards to a different table based on the currently running coroutine. This gives me virtualization of those metatables. It's really slow (3-4x slower[1] than the default string metatable) and I'd like to make it faster. Is that possible? [1] - I haven't actually benchmarked it, but default string metatable gives about 2 table accesses per operation; this thing does at least 8 when using globals, and that doesn't take into account interpreter overhead and all the function calls! [snip]
-- Cheers, Kein-Hong Man (esq.) Selangor, Malaysia