• Subject: Re: roy inspired statically typed, functional language for lua?
• From: David Manura <dm.lua@...>
• Date: Wed, 11 Jul 2012 01:01:17 -0400

```On Tue, Jul 10, 2012 at 2:05 AM, Paul K <paulclinger@yahoo.com> wrote:
> Possibly related; what is the state of the type inference logic you've
> implemented in lua-inspect?

To give an idea, consider this contrived example:

local function f(x)
if x == 0 then return "a","a" else return "a","b",3 end
end
local function g()
for ii=1,10 do
local aa = math.sqrt(ii)
local bb = aa > 0
local cc = bb + bb
return (f(cc))
end
return 5
end
local h = g()

`f` is seen to have two exits, and basically the unification of the
two returns implies `f` has return type ("a", `string`, `unknown`),
where `unknown` is the superset of all types.  It's deduced from the
numeric for loop that `ii` is a number.  The signature on `math.sqrt`
is therefore satisfied, and `aa` is of type number.  (If, however,
`ii` were known to be a *unique* number, then the exact value of `aa`
would be determined by actually executing the math.sqrt function,
which is marked as a pure function and therefore safe for it to
evaluate.)  A number compared to a number is a boolean, so `bb` is of
type boolean.  booleans don't support '+' (assuming no funny
debug.setmetatable on booleans), so `c` has type `error` (which you
may alert the user about).  The second return in `g` is recognized as
dead code, so return value of `g` is the result of `(f(cc))`, which is
"a".  So, `h` has value "a".  This is all done without truly executing
the `for` loop, at least in the usual way.

We also determine which variables are immutable, unused, undefined,
global/local, etc.

There are ways to inject type information with special comments
("--!"), if one so desires.  You see this in LuaInspect's own
codebase, where all AST nodes are named with a postfix "ast" in a
sort-of Hungarian notation, and this is made known to the analyzer via
a "--!" comment that indirectly calls  `context.apply_value('ast\$',
ast)` to annotate all variables in lexical scope matching a name
pattern with a given mock type.

This may sound all great.  Well, no, it has the potential to be great.
A lot of stuff in the analysis may be missing, and I don't see myself
working much on this soon.  Contributors welcome.  For example, the
binary operator evaluation ("dobinop") has the knowledge that
arithmetic operations on two booleans is bad, but it currently lacks
the knowledge that indexing a number is bad.  That would be easily
added though.  Also, whether this type system is all theoretically
good by some measure, probably not, but this is a lint tool and it
needs to be useful not perfect.

> From my reading of the code, you've
> attempted to infer the types of values and parameters to assist with
> detecting possibly invalid operations (although I haven't seen many
> warnings related to that).

I don't think you currently make use of this in ZeroBrane.  (The
inferred types/values are stored in ast.value, and the HTML and SciTE
plugins display this on hover or double click via
LI.get_value_details.)

BTW, something like M.show_warnings in ZeroBrane probably would be
good to have as a standard plugin in LuaInspect.  The current plugins
(even the CSV export from the command line) generate output optimized
for consumption by editors rather than by humans using it as a batch
mode friendly lint style tool.  I think some have expected to use
LuaInspect in that latter way, but M.show_warnings is more suitable
for that.

> I'm not too excited about something like Roy for three (partially
>
> 1. I'm fairly happy with what I already have in Lua; I'll be even
> happier camper when some of the type coercion rules go away.

Lacking that, type coercion could be flagged by a static analyzer in
some cases.  More simply, it would also be possible to do this
custom searcher function that rewrites all `a .. b` operations with a
function call `_concat(a,b)` that is then evaluated at runtime and
prohibits conflicting types (the idea is related to
http://lua-users.org/lists/lua-l/2011-12/msg00621.html ).

> 2. I don't see how it helps me with errors like "local foo =
> function()...end; do foo = 123 end". This is something that happened
> to me at least once when I accidentally assigned something to "next"
> without localizing it and it took me a good hour to figure out why my
> code suddenly started failing in a completely unrelated module. In
> fact, I resorted to "git bisect" to find out what really happened.

But in ML style languages, variables cannot be rebound like that.

In LuaInspect, mutable and immutable variables (in terms of binding)
are flagged differently, with the mutables often highlighted in
italic.  Not that someone might really notice this unless their coding
really stand out.  Ocaml's explicit "ref" (on the TODO list of Roy) is
that type of thing.  Some have said that "const" should be a default
qualifier in C++.

> 3. It doesn't go far enough for me. If we're going to implement some
> form of static typing/inference, I'd prefer to see something similar
> to F# that includes the notion of units
> (http://blogs.msdn.com/b/andrewkennedy/archive/2008/08/29/units-of-measure-in-f-part-one-introducing-units.aspx).

I've used that type of thing with a C++ analyzer.  in C/C++/D, a
related topic is "strong typedefs".  It's also related to your point
of preventing implicit type conversions between types that share the
same implementation (e.g. two types that alias to number).

> In terms of the actual implementation, it may be interesting to look
> at how to implement this as an optional component using static type
> inference and hints (possibly in comments to preserve compatibility).

Some options if, unlike Roy, one wants to stay within Lua syntax but
give hints to an analyzer:

Downside: a little awkward ("--[]" looks like minus).

local function get_distance(time --[s])
local rate = 5 --[m/s]
return rate * time
end
return get_distance(5 --[s])

- Hungarian notation:  Upside: versatile.  Downside: could be more
implicit; m_per_s also is more verbose than m/s

local function get_distance(time_s)
local rate_m_per_s = 5
return rate_m_per_s * time_s
end
local time_s = 5
return get_distance(time_s)

- unit constants: Upside: natural syntax.  Downside: how to
specify function parameter types?

local S = 1
local M = 1 -- dummy variables, intended only for analysis

local function get_distance(time)
local rate = 5*M/S
return rate * time
end
return get_distance(5 * S)

```

• References: