lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


tl;dr: if you have all the constraints C did 25 years ago, printf/scanf are reasonable. If you take Go's concrete syntax for expressing structured data as a given, using string patterns seem more ergonomic. I don't see why we should be constrained by either in 2013, but I don't have any concrete solution other than for Lua. E4X, RIP.

On Aug 13, 2013, at 5:28 PM, William Ahern wrote:

> On Tue, Aug 13, 2013 at 02:18:43AM -0400, Jay Carlson wrote:
>> On Aug 8, 2013, at 9:37 AM, steve donovan wrote:
>> 
>>> On Thu, Aug 8, 2013 at 3:20 PM, Lorenzo Donati <lorenzodonatibz@tiscali.it> wrote:
>>>> fmt.Println(time.Now().Format("2006-01-02 03:04"))
>> 
>>> It is very .. eccentric. Rob Pike is very proud of this idea ;)
>> 
>> It is a cute idea, but just like printf, it's a symptom of a lack of expressive power at compile-time.
>> [...[
>> Most C compilers these days have special mechanisms hooked up to the
>> printf/scanf functions to warn of mismatches in type between the format
>> string and the arguments. So there's a special case again: you can't write
>> your own replacements with the same functionality, especially if you are
>> in the -Werror camp, where warnings are treated as errors.
> 
> You can get close. Using C99's variable argument macros and C11's _Restrict,
> you can translate at compile time each argument to a pair of arguments--type
> and value. This type information can then be used at run time by the printf
> implementation.

Hmm, _Generic? OK, there's a couple hours wasted:

===
#define f_(X) _Generic((X), \
  long double: fmt_long_double(X), \
  double: fmt_double(X), \
  unsigned int: fmt_unsigned_int(X), \
  const char *: fmt_string(X)) (X)

typedef struct fmt_pair{ enum fmt_type t; union fmt_everything v; } fmt_pair;
const fmt_pair fmt_double(float X) { return (fmt_pair){ fmt_Float, { .myfloat = X }}; }
const fmt_pair fmt_unsigned_int(unsigned int X) { return (fmt_pair){ fmt_Uint, { .myuint = X }}; }
// ...

// c preprocessor trick I don't remember for recursive macros goes here
// this is just a sketch, probably depends on GCC extensions anyway
#define fmt_args_first(X, ...) f_(X),  fmt_args_list(__VA_ARGS__)
#define fmt_args_list(...) fmt_args_first(__VA_ARGS__)
#define fmt_args(...) fmt_args(..., (fmt_pair){ fmt_Sentinel })

#define fmt(...) fmt_body(fmt_args(__VA_ARGS__))
void fmt_body(...) { /* all args are of type fmt_pair */ }
#define fmt_with_spec(spec, ...) fmt_spec_body(spec, __VA_ARGS__)
void fmt_spec_body(const char *spec, ...) { /* checks "%d" string against fmt_pairs */
===

Usage: 
  fmt("yarrrgh", 12.7, (unsigned int) 4000000000);
  fmt_with_spec("%s %g %u\n", "yarrrgh", 12.7, (unsigned int) 4000000000));

Since one of the rules of C still seems to be "you cannot write apply()" you can't actually use printf to implement fmt since there's no way of putting the argument list back together again. Appropriate macrology to leave the raw args at the end of a fmt_like call is left as an exercise for any readers left.

Admittedly if my intent was to return better error messages at compile-time, this seems unlikely to do so.

> But that's rather ugly. Personally I think C's printf is an elegant
> interface to a thorny problem for statically typed, compiled languages that
> lack a dynamic execution capability at compile time.

C and obviously C++ have a fair amount of dynamic execution capability at compile-time, but they just don't want to admit it. Go has less, but has better declaration and literal syntax.

> The string template itself is not only a solution to a problem--dynamic
> string composition--but a feature all by itself. That's why the concept is
> often copied by so many other languages; specifically, concise format
> specifications that are filled in by values specified elsewhere, and the use
> of tiny domain-specific languages.

Yes, like how people write dynamic queries in SQL.

  char *q = asprintf("SELECT * FROM Students WHERE NAME='%s';", bobby);

This is probably the most famous example (and has an xkcd to slap people with), but it's a problem for any string in a structured format. String interpolation is fine for report generation for line printers[1], but most people do little of that. Most strings seem to be for other programs to consume, and in that case I assume string concatenation to be awaiting a CVE candidate number until proven otherwise.

Compare this to a compiled formatter:

q = SqlStatement$"SELECT * FROM Students WHERE NAME=$bobby"
print( q() )

Note that in my $"" syntax, $bobby is *not* interpolated; it's the job of the statement compiler to do whatever it wants. So, unwrapping:

q = SqlStatement( {"SELECT * FROM Students WHERE NAME=$bobby", bobby=bobby} )

Presumably this is turned into a prepared statement in the process, giving something like

return function() 
    return conn:prepare("SELECT * FROM Students WHERE NAME=?", bobby)
  end

(In Lua, SqlStatement would at least memoize the compiled form of the string, which is constant, regardless of what happens to Bobby Tables. There's a more clever thing to do with a foo:$"" syntax which does not allocate a table, but I haven't implemented it yet.)

Parsing the template at runtime is what http://golang.org/pkg/html/template/#hdr-Security_Model actually does, although it can't use lexicals either. 

Jay
[1]: I couldn't figure out how to write games in RPG-II, so I wasn't that interested.