[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Capture patterns
- From: "Jeff Wise" <jwise@...>
- Date: Mon, 7 Jan 2008 09:55:54 -0600
Hello,
I am attempting to capture a series of numbers (all dollar amounts). I have
experimented with multiple patterns to no avail. I created this small
program to demonstrate my problem.
I am reading PDF reports and pulling dollar amounts and a description. My
pattern is catching periods (used to indicate abbreviations) in the
description. I am unable to figure out how to build a pattern to do this.
Here's the layout:
Description 0.00 3,587.46 (125,000.00)
This description may contain parentheses or periods. The dollar figures are
in US standard accounting format where numbers enclosed in parentheses mean
negative values. The program contains sample data and you can see the
problem.
--
function extract(string1)
--
-- culls the string at the first digit in the report--
--
local s, e = string.find(string1, "[%(]*%d+")
if s == nil then return nil end
local item = string.sub(string1, 1, s - 1)
return item
end
--
--
--
function trim(s)
return (string.gsub(s, "^%s*(.-)%s*$","%1"))
end
--
--
-- 10/29/07 handle negatives in pattern. DOS uses "-" at end of number
--
local record = nil
local i = 0
local nums = {}
local line_data = {}
--
--/////////////////////////////////////////////////////////
--
local pattern = "[%-%(%$]*[%d%,]*[%d%.%d%d]+[%)%-]*"
--
--/////////////////////////////////////////////////////////
--
print("****************************Debug13*********************************"
)
print("debug13- Version 1.0 1/07/08")
print("debug13- Build CSV data from PDF listing of spreadsheet.", "\n\n")
line_data[1] = "This is good data 4.00 3.99 (1,768.50)"
line_data[2] = "Data- In a line 5.00 (100,000.00) 957,123.45"
line_data[3] = "Repairs- () (Wages) 123,456.99 28,123.45 650.00"
line_data[4] = "Repairs- Ex. Wages 50,120.00 500.00 1,000.00"
line_data[5] = "A bunch of negatives (123,456.89) (123,456.90)
(123,456.91)"
for i = 1, #line_data do
for num in string.gmatch(line_data[i], pattern) do
nums[#nums + 1] = num
end --do for
print(#nums, " <=== Number of captured numbers")
if #nums == 3 then
descr = trim(extract(line_data[i]))
print(descr, nums[1], nums[2], nums[3])
end --if
if #nums == 4 then
descr = trim(extract(line_data[i]))
print(descr, nums[1], nums[2], nums[3], nums[4])
end --if
nums = {}
print("----------------Loop Separator-------------")
end --do
print("debug13- End of execution")
CONFIDENTIALITY NOTICE: This E-mail message and all attachments, which originated from Sealy Management Company Inc, are intended solely for the use of the intended recipient or entity and may contain legally privileged and confidential information. If the reader of this message is not the intended recipient, you are hereby notified that any reading, disclosure, dissemination, distribution, copying or other use of this message is strictly prohibited. If you have received this message in error, please notify the sender of the message immediately and delete this message and all attachments, including all copies or backups thereof, from your system. You may also reach us by phone at 205-391-6000. Thank you.