[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: String matching on keys of a table
- From: Francisco Olarte <folarte@...>
- Date: Sun, 27 Oct 2013 18:38:37 +0100
Hi Meino:
On Sun, Oct 27, 2013 at 4:13 PM, <meino.cramer@gmx.de> wrote:
> Thank you for your help! :)
My pleaseure. But given the new information you've now revelaed it was
not the apropiate solution.
> Parts of the information about the contents of the data are not
> public available (like "closed source"), so I do have to be brief.
Now, you do not have to be brief, you just have to avoid revealing the
private information, ehich is probably not interesting.
> As I said, the size of the data storage has to be considered, as this
> is an embedded system with 512 MB. Means: If there are two algorith
> with the same result: Take the one with a smaller memory footprint.
This is where you should have started. The 512Mb is totally irelevant,
but the last sentence "If there are two algorith
with the same result: Take the one with a smaller memory footprint"
means you are going to have a hard time beating linear search ( the
'with the same result worries me, I thought your problem was
deterministic, so every correct algorithm must yield the same result
).
> There are roughly about 175000 keys, which should be searchable with
> substrings which not necessarily start at the beginning of the
> original key. That is: Key> Abrakadabra Search> kada
That is where you should have started. Some info like avg / min / max
key length, as well as the alphabet, would be nice. I mean, if you
have something like 'The keys cannot have nulls & I can spare memory
for a copy of the keys to speed up the search' means we could have
sugested things like 'Make a big string with the keys separated by
nulls, then search for the key as substring, in every foud point
search for nulls before and after and you've got the key. Do not
forget to put a nul sentinel at start and end'.
Anyway, if you have as a precondition 'Data is in a table and 1st
priority is space', coupled with 175000 keys, this nearly excludes any
solution with an auxilary data structure, ( I do not know if you do
not care about time, which can be the case given the small number of
keys ). Given this, a linear search, carefully coded, is nearly
optimum.
Francisco Olarte.