Or equivalently it can still use 9 bits for RK(B), one of them (k) being part of the instruction code (so there's two instruction codes allocated for the same "virtual" instruction).
You could as well still build a Lua VM using even more registers by allocating an additional instruction code that encodes the register in a additional extension code, And then pack the instructions with shorter codes than 32-bit, but a variable number of 16-bit entries; this has an interest only to compress the total size of the bytecode, to reduce the memory footprint (this is what is done in many CISC ISA, because it allows more code to fit the internal cache: the compact CISC code is expanded to a wide RISC instruction within the internal instruction decoder, then the wide instructions can be fed into multiple parallel pipelines).
RISC processors are disappearing, they are not really needed if you have integrated the instruction decoder in the CPU (or the VM), because CISC is then faster to reduce accesses to external memory.
So I'm convinced that LUa could (like also other bytecode VMs, including the JVM for Java) use an ISA with variable instruction sizes (in multiples of 8-bit, with extension codes).
And I think I could even make Lua not jsut better performing, but also fitting with smaller devices (with low internal memory, or slow external memory on Flash or ROM); to give access to at least 256 registers, you'd need extension codes for extra registers, but some registers are very frequently used in the bytecode, and a few of them could fit in the size of the initial instruction byte without needed any extension code (like in the x86/x64 ISA).
As I described, Lua 5.4 has only 8 bits for B. It can still address 256 registers because it uses separate instructions for constants.