String Indexing

lua-users home
wiki

In some languages, e.g. Python, C and Pascal, you can write a[5] for the fifth (or maybe the sixth) character of the string a. Not in Lua. You have to write a:sub(5,5) or string.sub(a,5,5). Can we do something about it?

From Lua 5.1 on, yes. Thus:

getmetatable('').__index = function(str,i) return string.sub(str,i,i) end
-- demo
a='abcdef'
return a[4]      --> d

But what about substrings, say a[3,5]? No, that's illegal. We have to use the __call metamethod instead.

getmetatable('').__call = string.sub
-- demo
a='abcdef'
return a(3,5)    --> cde
return a(4)      --> def -- equivalent to a(4,-1)

Let's get really fancy and implement a suggestion of Luiz himself. [1]

getmetatable('').__index = function(str,i) return string.sub(str,i,i) end
getmetatable('').__call = function(str,i,j)  
  if type(i)~='table' then return string.sub(str,i,j) 
    else local t={} 
    for k,v in ipairs(i) do t[k]=string.sub(str,v,v) end
    return table.concat(t)
    end
  end
-- demo
a='abcdef'
return a[4]       --> d
return a(3,5)     --> cde 
return a{1,-4,5}  --> ace
So there you have it: one-byte substrings with square brackets, to-from substrings with round, selected bytes with curly.

Note: using this simple __index method you will lose the ability to call methods on strings, such as a:match('abc'). You need to modify __index as follows:

getmetatable('').__index = function(str,i)
  if type(i) == 'number' then
    return string.sub(str,i,i)
  else
    return string[i]
  end
end

If you don't like that, you can omit the redefinition of __index and use a{4} instead of a[4].

Characters versus bytes

Always remember that these indexing functions select bytes, not characters. For example, UTF-8 characters occupy a variable number of bytes: see the discussion ValidateUnicodeString.

Lua = 'Lua'
print (Lua(1,3))    -->   L

RecentChanges · preferences
edit · history
Last edited July 16, 2011 12:48 pm GMT (diff)