Note that for octuple precision (256-bit) floating-point in IEEE754, the minimum buffer size is 82 bytes (including the null terminator, but excluding any optional group separator), so with alignment set to 8-bytes (64-bit architecture) or 16-bytes (128-bit arch), the suitable buffer size in the stack would be 96 bytes to display the full precision (you could still reduce it to 64 bytes if you accept to not display the full precision that would typically be used only for intermediate calculations of aggregates on large datasets with high precision, such as collecting many measurements: this is still enough for nuclear research today given the existing precision of units, today around 20 decimal digits for some physical scales, this may evolve soon with the reform of the meter in SI; the high precision is just needed to collect many quantic measurements at very high frequencies before processing them and rounding them because they have large margins of randomness where the wanted precision is hidden with lot of quantic noises).
256-bit floating point is already interesting searchers on IA and for processing "big data" sets that are for now still seen as very chaotic (e.g. in automated financial/trading applications, or meteorological or simulations in fluid mechanic, or in massively parallelized applications with lots of users like multiplayer online games on commercial game servers, where many users could feed their own scripts: many concurrent small Lua scripts, changeable/loadable in real-time without stopping the server)
Quad precision (128-bit) is already used in high-precision 3D manufacturing to control bots. I think they are already used in radioastronomy for controlling the shape of mirrors, or in the nuclear research industry in accelerators or for simulations of for researches on black matter and black energy (I've read an article suggesting its use for large arrays of telescopes); it may already have application in cryptography to speed up and secure the generation of keys with more challenging algorithms. Given it is already available on consumer markets since several decennials and there's an incentive with 3D rendering of light effects and raytracing in popular games, the existing 64-bit architecture will support it as part of their native vector instruction extensions.
Cloud computing with giant servers may already use them but would benefit as well from hardware implementations instead of relying on slow emulation (and energy-inefficient) with software libraries.
If you don't want to extend the stack size (and still don't plan to support octuple precision with their full precision, except by tweaking luaconf.h for these corner-side experimental architectures), using 48 instead of 50 would just be fine.