String Recipes

lua-users home
wiki

Here are proposed solutions for doing various types of common string manipulations in Lua.

Substring matching

Check if string X starts or ends with string Y

local function starts_with(str, start)
   return str:sub(1, #start) == start
end

local function ends_with(str, ending)
   return ending == "" or str:sub(-#ending) == ending
end

Trim (remove initial/trailing whitespace)

See StringTrim.

Changing case

Change the first character of a word to upper case

str = str:gsub("^%l", string.upper)

Change the first alphabetic character of a word to upper case

str = str:gsub("%a", string.upper, 1)

Put HTML tags in lowercase (but leaves attribute names untouched)

str = str:gsub("<[^%s>]+", string.lower)

Change an entire string to Title Case (i.e. capitalise the first letter of each word)

local function tchelper(first, rest)
   return first:upper()..rest:lower()
end
-- Add extra characters to the pattern if you need to. _ and ' are
--  found in the middle of identifiers and English words.
-- We must also put %w_' into [%w_'] to make it handle normal stuff
-- and extra stuff the same.
-- This also turns hex numbers into, eg. 0Xa7d4
str = str:gsub("(%a)([%w_']*)", tchelper)

Example:

> str = "foo"
> str = str:gsub("^%l", string.upper)
> =str
Foo
> str = "_foo"
> str = str:gsub("^%l", string.upper)
> =str
_foo
> str = str:gsub("%a", string.upper, 1)
> =str
_Foo

Convert words in all upper-case to lower-case

str = str:gsub("%f[%a]%u+%f[%A]", string.lower)

Note the use here of the "frontier" regular expression patter %f. Without it it is hard to match on a word boundary, including where the boundary is at the start or end of the string to be matched. Try it on the string "AAA bbb CCC dddEEE FFFhhh JJJ". For more details read about the FrontierPattern.

Splitting a string into a list of substrings

breaking the original string on occurrences of some separator character, character set, or pattern

See SplitJoin.

Iterate over words in a string (adapted from Lua manual)

-- words and numbers
for word in str:gmatch("%w+") do ... end

-- identifiers in typical programming languages
for id in str:gmatch("[_%a][_%w]*") do ... end

-- whitespace-separated components (without handling quotes)
for id in str:gmatch("%S+") do ... end

Iterate over lines in a buffer, ignoring empty lines

(works for both DOS and Unix line ending conventions)

for line in str:gmatch("[^\r\n]+") do ... end

Any of the above can also be done as a function iterator:

-- call func with each word in a string
str:gsub("%w+", func)

Text Wrapping

Wrap a string at a given margin

This is intended for strings without newlines in them (i.e. after reflowing the text and breaking it into paragraphs.)

function wrap(str, limit, indent, indent1)
   indent = indent or ""
   indent1 = indent1 or indent
   limit = limit or 72
   local here = 1-#indent1
   local function check(sp, st, word, fi)
      if fi - here > limit then
         here = st - #indent
         return "\n"..indent..word
      end
   end
   return indent1..str:gsub("(%s+)()(%S+)()", check)
end

Reflowing text into paragraphs

This builds on wrap to do a quick-and-dirty reflow: paragraphs are defined as lines starting with a space, or having a blank line between them:

function reflow(str, limit, indent, indent1)
   return (str:gsub("%s*\n%s+", "\n")
              :gsub("%s%s+", " ")
              :gsub("[^\n]+",
                    function(line)
                       return wrap(line, limit, indent, indent1)
                    end))
end

Repetition

Check if a string is a repetition of some pattern

str:gsub(pat, "") == ""

Check if a string is a repetition of some pattern separated by whitespace

not str:gsub(pat, ""):find"%S"

Check if a string is a repetition of some pattern and also satisfies some condition

str:gsub(pat, function(s) return ok(s) and "" or "*" end) == ""

Formatting

Interpolating variables into strings (string formatting)

Many languages provide a concise way to format variables into strings. Example:

print( "%-5.5s is %5.2f" % { "pi", math.pi } ) --> pi    is  3.14

See StringInterpolation for ways to do this in Lua.

URL/E-Mail/Web Processing

Note: see also CgiUtils.

Decode a URL-encoded string

(Note that you should only decode a URL string after splitting it; this allows you to correctly process quoted "?" characters in the query string or base part, for instance.)

function url_decode(str)
   str = str:gsub("+", " ")
   str = str:gsub("%%(%x%x)", function(h)
      return string.char(tonumber(h,16))
   end)
   str = str:gsub("\r\n", "\n")
   return str
end

URL-encode a string

function url_encode(str)
   if str then
      str = str:gsub("\n", "\r\n")
      str = str:gsub("([^%w %-%_%.%~])", function(c)
         return ("%%%02X"):format(string.byte(c))
      end)
      str = str:gsub(" ", "+")
   end
   return str	
end

Match Email addresses

email="alex@it-rfc.de"
if email:match("[A-Za-z0-9%.%%%+%-]+@[A-Za-z0-9%.%%%+%-]+%.%w%w%w?%w?") then
   print(email .. " is a valid email address")
end

CSV (Comma-Separated Value) Parsing

See CsvUtils.

See Also


RecentChanges · preferences
edit · history
Last edited December 7, 2018 3:49 pm GMT (diff)