Counting From One

lua-users home
wiki

Contrary to the most popular languages today, Lua array indices represent an ordinal position within the array rather than an offset from the head of the array. (This scheme was unfortunately referred to as "counting from one", leading many to defend it as the natural way to count. Really, the argument is in the use of offsets vs. ordinals to indicate an element within a sequence.)

There is exactly ONE spot in the core of Lua where this is true: the parsing of constructors such as {10,20,30} . The convention of starting with 1 is only enforced by the libraries, which are not part of the language. --lhf

This is no longer true with Lua 5.1, as the # operator is also dependent on a base index of 1.

If you are a person referred to as a "scripter" or "non-programmer", here is a call to take a stand regarding future languages. Make it known that you are as perfectly capable of grasping the concept of offsets as any programmer. Lua lists have a base index of 1 because it was thought to be most friendly for non-programmers, as it makes indices correspond to ordinal element positions. However, this dubious concession causes programmers a lot of grief, which in turn decreases their efficiency in providing you with the tools to do your work. One reason is that Lua is tightly coupled with C which uses a base index of 0 for arrays-- the indices represent an offset from the head of the array. This means that programmers working on both Lua and C sides must constantly switch reference in both their code and mind, which is annoying and error prone. Also it turns out that using offsets goes hand in hand with a certain way of specifying ranges where the end is not included (known as a half-open range), while using ordinal positions does not. Half-open ranges allow more natural handling of zero-length ranges which leads to more concise and consistent programs.

being both a C++ and a Lua programmer, i have to completely disagree with this rant. Counting from 1 is the "natural" way of counting. In math, everyone counts from 1, too. Counting from 0 is only good for the C convention that an array and a pointer are equivalent, and that a[i] is the same as *(a+i). Also, i think, it is easy for a programmer to adapt to the non-C-like way of counting. Yet it makes Lua much more intuitive for "casual programmers". --PeterPrade

You're not a C++ programmer. Can't be. You may know how to program in C++, but you aren't a C++ programmer. Just a tourist.

Agreed, it is a rant :-). However, your argument doesn't address the issue of how half-open ranges allow more concise coding, which is my main point. When dealing with ranges Lua, programs will tend to be littered with +1's because Lua does not use half-open ranges, where as in C++ (especially when using STL, which uses that style of ranges in its entirety) this won't happen. In addition, I agree that counting from one is natural to humans (I never stated otherwise), however representing zero-length ranges with [n, n-1] is very awkward. It is not natural to communicate "take no steps forward" by indicating to someone a start point on the floor, and then a destination point one step behind. In contrast humans can easily understand "take no steps forward" by indicating a start point on the floor, and a destination being that same point-- corresponding with the half-open range [n, n). --JohnBelmonte

Sometimes half-open ranges allow concise coding and sometimes not. I've seen (and committed) plenty of fencepost errors with both half-open and closed ranges. The fact that the last element in a C array has the index one less than the length of the array, for example, can lead to a variety of -1's in code; it is certain that there are techniques for eliminating these, but I wouldn't say that my Lua code is any more littered with +1's than my C code is with -1's. With respect to half-open ranges, though, there is a serious problem: you need to be able to represent a quantity that is not in the range. Consequently, for example, the type of an index of a string of length 256 needs to be at least short, even though every valid index is a byte. Similarly, a half-open range descriptor to the end of a vector contains an address which is not included in the storage of the vector, and which may well a valid pointer to a different object; this gets complicated for conservative garbage collecting, for example. (At least one conservative garbage collector deliberately overallocates to compensate for this problem.) I'm not standing up for one or the other: both are valid, both have advantages, and both have disadvantages. --RiciLake

as with counting from one, i think closed ranges are much more intuitive than half-open ranges, at least when you're not talking about the special case of a zero length range. When i say "intuitive" i mean it is more natural for someone who has not been a programmer at least for some years. --PeterPrade

I have not counted them, but I think that a vast amount of programming languages start counting at zero... And as I have learned C as my first language, starting at one seems confusing to --in fact I wrote a bunch of bad Lua code with arrays starting at zero in mind --AdrianPerez

Although counting from zero has its advantages, I find counting from one much more natural, even when programming. It is not a problem for me to switch between Lua and C, as in the past I've switched a lot between Visual Basic and C --Anonymous

I don't find (y - 1) * max_x + x to be any less readable than y * max_x + x --TimothyDowns?

First is 1st, not 0th --Kazimir Majorinc

"I'm in the process of evaluating Lua for an embedded scripting language in my app., and everything I have seen up to this has been very positive, until I realized Lua is counting things from index 1. Bad. Bad. Baaad. Bad enough to consider tossing the whole thing out the door. As for justification, even though there shouldn't be a need of any :),: indices of an array of length N are a natural map to the ring of integers modulo N, that have a lot of nifty properties. For example, when you want to access your array cyclically, you just do index = (index+1) %N. Doing the same thing with indices starting at 1 is a pain the neck. Also, it makes binding C routines to Lua utterly painful."

index = index % N + 1 --Milano Carvalho

Exactly, all of a sudden we have a bug and extra work where none needed to be. In addition, the programmer now has to remember "Is index a position and violates modulo preconditions or is it the actual modulo?" Finally, what's the inverse of that? Is it: "modindex = index - 1 % N" or is it "modindex = index % N - 1"? Trick question: it's "modindex = (index - 1) % N". --Andrew Lentvorski

agreed. the Lua C library interface to C uses conventions which are unnatural to C. for example when parsing Lua tables into C arrays you have to remember to use iterator+1 to access the table index. --DanHollis?


Counting from 0 is not the sticky issue for me personally; counting from 1 is just a symptom of the problem of the paucity of datatypes in Lua, not by any means the sum of the problem. My problem is that the lack of datatypes, including those arrays, 1 or 0 base be damned. Tables may have a tightly packed "array" part, but I'm not keen on my 33 word array taking 64 words of space (has to be a power of 2). Don't get me started on the fact that I can't even have an array of bytes, but only massive bloatsome double-precision floats instead (or doubles, or the choice of exactly one (1) numeric datatype that Lua is compiled with). Yes, I'm aware Lua makes it easy to glue to real arrays in C. But then it's not really in Lua, it's some second-class "userdata" type I can't really treat like a real list.

Lua 5.1 provides the lua_createtable() API call which allows you to specify the array size of the table (precisely). You can also specify the hash size, but that part is always restricted to a power of 2. If you create a table with a specified array size, its array part will be precisely that size so long as you do not force the table to expand, so it will work fine if you know the size of the table in advance. Binding arrays of small atomic objects, like ints, is made slightly easier in 5.1 as well, because of the possibility of overriding the # operator, but to make the userdata really act like a Lua table, you need to modify the default definition of ipairs and possibly of unpack. I've put some code for the former on the Wiki at GeneralizedPairsAndIpairs, in case it is of use to anyone. --RiciLake

An excellent point, something I've struggled with too. Interfacing large C structs to lua, especially C arrays of large structs is not fun. Lua does not make these mappings straightforward. --DanHollis?


Lua source code has a constant lua_Number set to double as default. In the same way, it might be interesting to define a constant lua_FirstIndex (for instance) set to 1 as default, to refer to the first index. Every part of Lua related with the first index of an array (for instance, ipair, foreachi, etc.) should use this constant. If a user wants to change this constant to 0 and recompile, he/she could use Lua tables beginning from 0 at his/her own risk of loosing compatibility. The same constant should be accessible from Lua to allow also scripts being portable or independent of the first value of index arrays. --Salvador Espana


The question whether counting in computer science should be zero- or one-based is a classic one. I found the arguments in Edsger Dijkstra's article "Why numbering should start at zero" (http://www.cs.utexas.edu/users/EWD/ewd08xx/EWD831.PDF) very convincing and I became a strong counting-starts-at-0 advocate. Too sad, Lua, suggests starting at 1. This will make table/array constructions awkward: {[0]=10, 20, 30} and the table-library functions of lesser value. --Ulrich Hoffmann <uho@xlerb.de>

Yes, nice article. I especially like the very first sentence: "To denote the subsequence of natural numbers 2, 3, ..., 12 [...]." Now, combine that with the metric that computer languages should primarily communicate their intent to the reader, and only incidentally to the computer, and we see that even for Dijkstra the most natural notation is the inclusive one... Draw your own conclusion.


Counting from one is more "intuitive", but counting from zero is algorithmically much better. This is a programming language for writing algorithms to be executed by a computer, not a language for daily conversation between humans. Programmers are smart, they can figure it out. Don't treat them like children. :) --DanHollis?


I second using half-open over closed ranges, and thus counting from 0 over counting from 1. Not only a zero-length range is naturally represented as [n,n) but also adjacent ranges nicely add up -- [a,b)+[b,c)=[a,c) -- without excessive doubling at the ends. The resulting range is (b-a)+(c-b)=c-a -long, and any or both of the components can be zero-length. Also, one can consider `negative ranges' [m,n) where m>n and the addition of ranges still works, now being even possible to obtain a zero-length range from non-zero ones.

Now, consider numbers a, b, and x. A correct expression for `x is between a and b' in the sense that min(a,b)<=x<max(a,b) (a half-open range) is (a<=x)=(x<b). Were it a closed range (...<=x<=...), we wouldn't have had such a nice expression. -- Boyko Bantchev


Personally I don't have a problem with counting from 1. For one thing, the idiom of 1 being the first character in a string and -1 being the last is a nice idea. That wouldn't work if the first character was 0.

As a script language its first priority is to be easy for scripters to use. An array of 5 elements that has the last element as number 5 is surely easier to explain to casual scripters, than that it finishes at element 4?

I have interfaced many client functions in my system to expose their workings to Lua. The question of zero-or-one based doesn't even apply in many cases. For instance, a lot use string keys (in which case the problem goes away), or they don't use an array of any sort.

If you want Lua to be a universal scripting language I certainly would not recommend a compile-time option, so that half the Lua scripts published work based on zero and half based on 1. You would be opening the doors to a nightmare doing that. I can't even see how that can work if you want to interface things like LuaSocket, LuaCom etc. These would be written assuming the current convention, that arrays start at 1, and many are supplied with precompiled binaries for Windows. These would either not work at all if you ran on a zero-based system, or the authors of each package would have to clutter their code with tests for what the base is, surely losing any advantage of changing it in the first place. -- NickGammon


One more point, internally Lua uses the convention of 1 as first and -1 as last in many places. For example:

double num = luaL_checknumber (L, 1); /* get first item on stack */

Thus, programmers who are interfacing Lua with C are very familiar with that convention - you have to be. Again, if you made arrays zero-based, would you change that too? In which case how do you get the last element from the stack (which is currently -1)?

Some of the posts above don't really mention if they are referring to strings as well. For instance, the first item in a string:

c = string.sub ("ABC", 1) --> "A"

Would you make that zero also? If so, how do you get the last item? -- NickGammon

Using -1 as an alias for the last element is perfectly consistent with half-open ranges (i.e. counting from zero). See Python. In fact, it is more consistent than with closed ranges. Consider for a moment that lists wrap around, and you can move from beginning to end and back the "short way". One left of position 0 (the beginning of the list) is -1 (the end of the list). In other words you can just subtract 1 from your position to move left, which is natural. If you start a list at 1 instead, it produces a strange gap of two positions between the beginning and end. To move to the left, you'd have to subtract one from your position, unless you were at position 1, in which case you'd subtract 2.

Then do you also propose that greatest_index + 1 wrap around to the first element? If not, there would still be these 'strange' jumps when going from the last element (specified with positive index) to the first, so that argument doesn't make any sense.

No-- if you want indexing to wrap, use i modulo n. The simplicity of that equation is one of the attractive points of indexing with offsets.

Yes, that is true. However given that the "deed is done" now, I suggest that applications that need the "count from zero" approach for mathematical or other reasons, simply define their own foreachi function to work around the current behaviour. After all, there is nothing stopping you from putting elements into position 0 of a table right now.

If you want to make a table constructor start at 0 do this:

t = { [0] = "a", "b", "c" }  -- a is in position 0, b is in 1 and so on

-- NickGammon

Unfortunately #t evaluates to 2. Much of the problem arises from the confusion of numerals with ordinals. When indices are offsets, 0 is the offset of the first item in the array. If we were a bit more scrupulous in retaining the type information with the use of numbers in vernacular speech there would be less contention on this. In mathematics, realizing a non-negative integer n as the set of numbers from 0 to n-1 inclusive gives much neater formulae than identifying it with the set of numbers from 1 to n. -- GavinWraith

Though in math, it is common for matrices and vectors to be 1-indexed[1][2][3]. Similarly, in SimpleMatrix, the first element is referenced as mtx[1][1]. Sequences maybe vary more[4]. Concerning math software, Mathcad by default 0-indexes (though it can be changed), while Matlab, Maple, and Mathematica 1-index. --DavidManura


However, #t (or, table.getn) is not defined as returning the number of elements in a table. For example:

t = { foo = 1, bar = 2 } ; print (table.getn (t))  --> 0

From the Lua manual:

The length of a table t is defined to be any integer index n such that t[n] is not nil and t[n+1] is nil;

My example is consistent with the definition. #t returns the index of the last item.

-- NickGammon


Visual Basic and especially JavaScript? are two languages similar to Lua. It would be nice to have a low entry barrier for non-technical users migrating from these languages. In JavaScript? arrays are indexed from 0. In Visual Basic arrays are indexed from 0 by default. (Changing the default is strongly discouraged and not supported by modern implementations such as VB.Net.)


I was looking to see if Perl is indexed from 0 or 1 (zero it seems), and in my Perl manual found this gem:

An array subscript n, where n is any non-negative integer, always refers to array element n+1.

It is obvious we are not going to reach agreement here, however don't you think that is just slightly confusing? "Subscript n ... refers to element n+1?". At least in Lua, subscript n refers to element n.

-- NickGammon

It's only confusing if your base is 0 and you think of the index an a position. If your base is 0 then the index is best thought of as an offset from the start of the list, not a position. This is the same point just made by Gavin. Generally speaking, offsets are more useful in programming than positions.


"The set of natural numbers" is the set of integer numbers greater or equal to zero in US, and the set of integer numbers greater than zero in Soviet Union. It's like indentation.

IMHO counting from 1 is great. First element is 1, last element is -1. No need to remember "this is not math, this is not something natural, get used to it", like when you deal with indices in python.

-- muntyan


Lua's key advantage is that it is the most embeddable of the scripting languages -- it really isn't the best top-level scripting language for general programming tasks (python and perl are better). As an embedded scripting language, it is going to be embedded in compiled languages. Almost all of them use 0-based indexing, so most of the new users are going to have this frame of reference. VB had this debate (and the dyanmic option to choose), and from what I've seen, people voted for 0-based indices. So let's just do this and get it over with. -- GRB

E x a c t l y -- JeanClaudeWippler


Please don't write in the first person without signing your name.

So all that number theory down the drain? Gee, you get to be one year old the second you are born and Mr. Clarke was wrong about 2001. Modulo of -0 how helpful. Sorry, but I think that whole idea needs to be rethunked, revised and corrected. No, starting at one is not as natural as one person wrote, it is because zero is intuitively assumed; therefore; need not be stated. But the good old computer does not assume. Just my thoughts ---trav


FindPage · RecentChanges · preferences
edit · history
Last edited May 11, 2008 1:03 am GMT (diff)