The difficulty is "any builtin types". This was sufficient in early C definitions, no longer sufficient when the number of "builtin" types to support explodes. Adn the overhead for aligning large types is not justified, as it wastes memory (even if memory is "cheap" now, the computing devices are used to process tons more data with tons of concurrent threads, and we never have too much memory, and paging memory to external storage remains very slow, too slow for the performance we count on to support many more concurrent transactions and tons more data today, especially on servers for network services on the Internet).
There is stil the need to spare memory, and the desire to avoid padding is still there (and in C++ this is possible with the "new(pool parameters)" operator for specific allocators, which is very convenient to manage objects that are no longer allocated by malloc/calloc, these objects not necessarily having the padding requirement.
This is also true for lot of network protocol handlers that remove paddings completely: if these protocol formats are not directly representable efficiently in C/C++ structures, you need to implement very slow (de)serializers before processing the embedded data (which requires additional memory for the buffer transforms).
Being able to represent data structures with minimum paddings (or all padding suppressed) is a constant need. That's where the C/C++ standard are evolving and why compilers also have now various directives and pragma controls or declarators to control this, but such thing should become portable and part of the language itself (I don't care at all about the old C99 standard or older standards, or the current limitations of GCC for specific platforms it supports today: strict conformance to C99 does not aply to all platforms because it is insufficient to cover all use cases; so applicatiosn have to use various hacks, notably the very unsafe "pointer aliasing" which C/C++ allow but which causes so many portability problems:it's still difficult to represent complex data structures containing members with multiple datytypes; only arrays/vectors of native types are relatively safe, but there are also exception like arrays of "long double", and the newest IEEE floatting point types, and still major difficulties to represent bitsets: C was only tuned to support well two types: "char" and "int", everything else requires hacks; even 'int" alone causes problems due to byte order; bitfields also have no reliable bit orders; C/C++ do not decouple the datatypes needed only for internal local processing and datatypes needed for interchanges, and even for internal processing, its model wastes memory if the internal processing has to handle lot of data, multithreading or other memory sharing mechanism, including virtualized filemappings).
If you just count on strict C99 conformance, most programs will not run, as they won't interface with anything, except by slow and wasteful (de)serializers (which may be very complex to write portably). That's why there are so many libraries developed separately trying to solve that problem for specific goals, but adding another layer of complexity (the various external APIs to support which are not portable across OSes and device types).
Other languages don't have this problem: they describe precisely an unambiguously the datatypes they need, and it's up to the compiler to generate an efficient code to support these datatypes. Programs are much simpler to write and port. you don't need to write so many adapation libraries for specific applications: this is centralized in the standard behavior and implementation fo the compiler (this is what happens with Java, DotNet and even now in _javascript_, and what is also needed for Lua as well).
C/C++ is very difficult to port and test: programs written in these languages need to be tested on specific platforms. One solution would be to develop a virtual platform model and develop a separate VM engine for it (this is what is used to support a Linux-like system in 100% pure _javascript_; not only this is not inefficient, but it is in fact very fast and removes lot of necessary tests: it's easier to develop and test the VM itself than tons of candidate programs using it; on such modelized virtual platform, there is no logner any complex portability problem: programs are specifically tuned for that single virtual platform, which is then emulated and recompiled ion a true local machine using all sorts of possibile optimizations that zillions of initial programs don't have to manage themselses; this is the same reason of the success of Java: one code running everywhere with the best performances for each target machine; the JIT compiler becomes an integrant part of the VM supporting the same virtual machine model with very precisely defined goals and rules; the same could apply to Lua and applies already to _javascript_, but C/C++ lays far behind). C/C++ should have been abandonned since long for application development, but only for implementing VMs (and in that case you don't even need all the complexity of C++, a "stronger C" is enough for most part of the code, plsu a some native assembly code for the specific platform for which the VM is built).