String Recipes

lua-users home

Here are proposed solutions for doing various types of common string manipulations in Lua.

Substring matching

Check if string X starts or ends with string Y

local function starts_with(str, start)
   return str:sub(1, #start) == start

local function ends_with(str, ending)
   return ending == "" or str:sub(-#ending) == ending

Trim (remove initial/trailing whitespace)

See StringTrim.

Changing case

Change the first character of a word to upper case

str = str:gsub("^%l", string.upper)

Change the first alphabetic character of a word to upper case

str = str:gsub("%a", string.upper, 1)

Put HTML tags in lowercase (but leaves attribute names untouched)

str = str:gsub("<[^%s>]+", string.lower)

Change an entire string to Title Case (i.e. capitalise the first letter of each word)

local function tchelper(first, rest)
   return first:upper()
-- Add extra characters to the pattern if you need to. _ and ' are
--  found in the middle of identifiers and English words.
-- We must also put %w_' into [%w_'] to make it handle normal stuff
-- and extra stuff the same.
-- This also turns hex numbers into, eg. 0Xa7d4
str = str:gsub("(%a)([%w_']*)", tchelper)


> str = "foo"
> str = str:gsub("^%l", string.upper)
> =str
> str = "_foo"
> str = str:gsub("^%l", string.upper)
> =str
> str = str:gsub("%a", string.upper, 1)
> =str

Convert words in all upper-case to lower-case

str = str:gsub("%f[%a]%u+%f[%A]", string.lower)

Note the use here of the "frontier" regular expression patter %f. Without it it is hard to match on a word boundary, including where the boundary is at the start or end of the string to be matched. Try it on the string "AAA bbb CCC dddEEE FFFhhh JJJ". For more details read about the FrontierPattern.

Splitting a string into a list of substrings

breaking the original string on occurrences of some separator character, character set, or pattern

See SplitJoin.

Iterate over words in a string (adapted from Lua manual)

-- words and numbers
for word in str:gmatch("%w+") do ... end

-- identifiers in typical programming languages
for id in str:gmatch("[_%a][_%w]*") do ... end

-- whitespace-separated components (without handling quotes)
for id in str:gmatch("%S+") do ... end

Iterate over lines in a buffer, ignoring empty lines

(works for both DOS and Unix line ending conventions)

for line in str:gmatch("[^\r\n]+") do ... end

Any of the above can also be done as a function iterator:

-- call func with each word in a string
str:gsub("%w+", func)

Text Wrapping

Wrap a string at a given margin

This is intended for strings without newlines in them (i.e. after reflowing the text and breaking it into paragraphs.)

function wrap(str, limit, indent, indent1)
   indent = indent or ""
   indent1 = indent1 or indent
   limit = limit or 72
   local here = 1-#indent1
   local function check(sp, st, word, fi)
      if fi - here > limit then
         here = st - #indent
         return "\n"..indent..word
   return indent1..str:gsub("(%s+)()(%S+)()", check)

Reflowing text into paragraphs

This builds on wrap to do a quick-and-dirty reflow: paragraphs are defined as lines starting with a space, or having a blank line between them:

function reflow(str, limit, indent, indent1)
   return (str:gsub("%s*\n%s+", "\n")
              :gsub("%s%s+", " ")
                       return wrap(line, limit, indent, indent1)


Check if a string is a repetition of some pattern

str:gsub(pat, "") == ""

Check if a string is a repetition of some pattern separated by whitespace

not str:gsub(pat, ""):find"%S"

Check if a string is a repetition of some pattern and also satisfies some condition

str:gsub(pat, function(s) return ok(s) and "" or "*" end) == ""


Interpolating variables into strings (string formatting)

Many languages provide a concise way to format variables into strings. Example:

print( "%-5.5s is %5.2f" % { "pi", math.pi } ) --> pi    is  3.14

See StringInterpolation for ways to do this in Lua.

URL/E-Mail/Web Processing

Note: see also CgiUtils.

Decode a URL-encoded string

(Note that you should only decode a URL string after splitting it; this allows you to correctly process quoted "?" characters in the query string or base part, for instance.)

function url_decode(str)
   str = str:gsub("+", " ")
   str = str:gsub("%%(%x%x)", function(h)
      return string.char(tonumber(h,16))
   str = str:gsub("\r\n", "\n")
   return str

URL-encode a string

function url_encode(str)
   if str then
      str = str:gsub("\n", "\r\n")
      str = str:gsub("([^%w %-%_%.%~])", function(c)
         return ("%%%02X"):format(string.byte(c))
      str = str:gsub(" ", "+")
   return str	

Match Email addresses

if email:match("[A-Za-z0-9%.%%%+%-]+@[A-Za-z0-9%.%%%+%-]+%.%w%w%w?%w?") then
   print(email .. " is a valid email address")

CSV (Comma-Separated Value) Parsing

See CsvUtils.

See Also

RecentChanges · preferences
edit · history
Last edited December 7, 2018 3:49 pm GMT (diff)