lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Somebody else can add the C standard citations and correct me; I'm not on the net as I'm writing this.

On Aug 15, 2013 11:00 PM, "Andres Perera" <andres.p@zoho.com> wrote:
>
> On Wed, Aug 14, 2013 at 7:23 PM, Jay Carlson <nop@nop.com> wrote:
> > tl;dr: if you have all the constraints C did 25 years ago, printf/scanf are reasonable. If you take Go's concrete syntax for expressing structured data as a given, using string patterns seem more ergonomic. I don't see why we should be constrained by either in 2013, but I don't have any concrete solution other than for Lua. E4X, RIP.
> >
> > On Aug 13, 2013, at 5:28 PM, William Ahern wrote:
> >
> >> On Tue, Aug 13, 2013 at 02:18:43AM -0400, Jay Carlson wrote:
> >>> It is a cute idea, but just like printf, it's a symptom of a lack of expressive power at compile-time.
> >>> [...[
> >>> Most C compilers these days have special mechanisms hooked up to the
> >>> printf/scanf functions to warn of mismatches in type between the format
> >>> string and the arguments. So there's a special case again: you can't write
> >>> your own replacements with the same functionality, especially if you are
> >>> in the -Werror camp, where warnings are treated as errors.
> >>
> >> You can get close. Using C99's variable argument macros and C11's _Restrict,
> >> you can translate at compile time each argument to a pair of arguments--type
> >> and value. This type information can then be used at run time by the printf
> >> implementation.
> >
> > Hmm, _Generic? OK, there's a couple hours wasted:
> >
>
> > ===
> > #define f_(X) _Generic((X), \
> >   long double: fmt_long_double(X), \
> >   double: fmt_double(X), \
> >   unsigned int: fmt_unsigned_int(X), \
> >   const char *: fmt_string(X)) (X)
>
> then i guess i want to print an unsigned int always as %u, or another
> size appropriate format? why associate a type that encodes signedness
> and size (unsigned int) with a string representation of decimal (%u)
> over octal (%o)?

Because the signedness of integers in C is part of their type.

You snipped the rest of the code; I'll paste the end:

> Usage:
>  fmt("yarrrgh", 12.7, (unsigned int) 4000000000);
>  fmt_with_spec("%s %g %u\n", "yarrrgh", 12.7, (unsigned int) 4000000000));

The point of this is not just to build a debug aid like print() but also to support at least run-time diagnosis of argument/spec mismatches without requiring compiler support.

C11 has been off my radar, so I didn't know there was a way of doing type dispatch at compile-time. I am not certain there really is, because I am confused by the wording of _Generic in relation to the pointer zoo.

> people that try to come up with format-inducing printf() replacements,
> where induction is based on argument type and the language doesn't
> support typeclasses, end up causing a combinatorial explosion of types
> whose only purpose is to define subtly distinct toprintf() methods

printf does not support every argument type, and there are a fixed number of numeric types anyway. If you get bored with all the weird ints you can start promoting them although this isn't transparent for scanf.

> whats a *real* problem with printf() in a safe memory model runtime,
> where you dont get segfaults in f("%s %s", a1) for reading a possibly
> NULL pointer??

(tl;dr: I don't understand what you mean by "safe memory model runtime"; my mental model of C and probably Go doesn't fit so I'm going to think out loud about how to implement what I think you mean.)

If you can handle varadic functions when the callee is confused about the types and number of its arguments without bringing down the runtime, you have a very odd C implementation. I'll suggest 286 protected mode with variable args passed in a fresh segment, although the 286 is a cliche of oddball but legal C implementations. You would still have a nasty pointer forgery/misinterpretation problem with doubles read out as the wide pointers such a platform uses, and I don't see any way out of that.[1] Read-only use might be fine, but "didn't crash" is different than "didn't output (attacker-controlled) garbage."

You could tag the values passed to all varadic functions--but that's what I just did for fmt_with_spec.

Another version is to alternate types and values as arguments. This doesn't look possible in C11, since using _Generic(X) twice will evaluate X twice. OTOH most of the compilers I care about support the GCC extension.

> (type induction is separate issue; in a weakly typed language you are
> already checking types for other purposes; why is this singled out??)

You can emit warnings for argument mismatch in printf at compile-time because the compiler knows the format string language. This is very useful. You can't do this for other languages. The best you can do in C11 is detection at runtime, and I didn't know that.

So why the heck is this worth writing about on lua-l? Let's back up.

1) We already handle varadic functions with strong typing, so runtime checks of a Lua fmt_with_spec are not a problem.[2] Unlike C's printf, C/Go/Lua can't perform compile-time diagnosis of simple syntax errors in the novel formatting language, so you need test coverage for every invocation. Or a code-walker.

2) But Lua and Go have an obvious pattern to syntax-check little languages: compile them in the module initializer. It is good to do this this textually close to their use; Lua kinda needs something like the "static" syntax for this. Since most C environments do have some form of initialization segment this could be done in a non-portable way.

3) I think it is bad that trying to accomplish the same task with object literals is more annoying than using crappy little languages in C/Go/Lua strings. Modern languages should make it easier to structure data than to write something like regexps.[3] I don't know how to fix this, but E4X has been pushed out of the blimp.

4) The "no macros" theorem says Lua is not gonna see any domain-specific dynamicism in the compiler, so any improvements to this situation are in one of two places:

4.1) Better object literal syntax. But look at lpeg or your favorite data definition hacks; Lua is already very flexible. (The biggest missing piece is domain-specific control flow, but I was talked out of full-strength short lambdas and I am lost.)

4.2) Lua can deal with load-time compilation of little languages fairly nicely. So even the string route is not so bad in Lua. The style I've been calling Localized Lua had some don't-repeat-yourself violations anyway, and they do discourage the use of lexicals in those string-based languages.

And what started this whole mess was Go's aspirations to be a mid- to high-level language, and kinda-sorta doing a good thing with html templates but completely losing on *date formatting*. I can put up with this in C because I already have to be in full obsessive-compulsive mode. (People who write extensive string-handling applications in C stdlib are mad or bad.) But Go wants to be better, and it being proud of its approach hit my reset button: in 2013, not "what's better" but "what's good".

Jay

[1]: Well, OK, there is at least one way to avoid inadvertant pointer forgery. Everybody has some union of all the interesting value types lying around. Just replace the word "union" with "struct"; 0-initialize that and use it for each argument. Misinterpretation of an argument will give a 0 since you're reading/writing in the wrong place. But then somebody cancels your project since every printf/scanf callpoint not only uses an allocated descriptor but zots all the L1 cache. Surely at some point the cost of writing a fmt-language parser/checker into the compiler becomes cheaper. But it's not secure from screwups by vprintf-like intermediaries, so you're back to fmt_pair type/value tuples.

[2]: Syntax checking is one thing; argument type compatibility can almost be memoized by callpoint in C! Add a declaration of a static bool to the fmt_with_spec() macro, pass in a reference to the library function, and after the first time you notice somebody got their args right, set it to true; C function callpoints do not have varying argument types.

[3]: Except perl5 regexps *are* part of the language.