[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: String matching on keys of a table
- From: meino.cramer@...
- Date: Sun, 27 Oct 2013 16:13:01 +0100
Francisco Olarte <folarte@peoplecall.com> [13-10-27 14:20]:
> Hi:
>
> > is there anything more sophisticated than linear search, which I can
> > do to match a key (string) in a table from a user input, which is only
> > a part of that key?
>
> There are lots of things you can do, but you should define it a little better.
>
> > (the whole thing should run on a embedded system under Gentoo Linux.
> > There are 400MB of mem...not Gigs of it ... ;)))
>
> That is irrelevant if you do not post how many keys are there in the
> table and also which caractereistics do they have, things like wheter
> the keys share common prefixes ( I suppose they do otherwise you'll be
> limited to a trivial 256 maximum different keys ), and how many keys
> there are.
>
> You can use a trie, as suggested by a previous post, but this is gonne
> be big if implemented under native lua.
>
> For the general problem I would use an index table with all the keys
> sorted. As lua shares strings this will only cost you memory for the
> array part of the table, a pointer per key. Then you take the user
> string and do a binary search for it. If binary search finds it, then
> your starting index is the previous slot, if it doesn't then it is the
> insertion point. Then you loop in the table, and while the entry
> starts with the user string you have keys with start with it. An
> example.
>
> sorted keys = 1:'', 2:a, 3:aa, 4:ab, 5:c, 6:x, 7:z11 8:z12 9:z21
>
> If user input is a, binary search finds it at 2, then you start
> looping and get a, aa, ab and stop when reading c.
>
> If user input is aa, binary search finds it at 3, then you start
> looping and get aa and stop when reading ab.
>
> If user input is z, binary search tells you it should be inserted at
> 7, and you extract z11, z12, z21
>
> If user input is y, binary search tells you it should be inserted at 7
> too, but you will stop at z11 without using any key.
>
> Be careful when implementing binary search, remember to make it easy
> you tipically start with low=0, high=count+1, not with 1, count as
> some textbooks do.
>
> Also, if you have few keys you could just loop, but as you want to
> avoid linear search I supose it is not appropiate.
>
> Francisco Olarte.
>
Hi Francisco,
Thank you for your help! :)
Parts of the information about the contents of the data are not
public available (like "closed source"), so I do have to be brief.
As I said, the size of the data storage has to be considered, as this
is an embedded system with 512 MB. Means: If there are two algorith
with the same result: Take the one with a smaller memory footprint.
There are roughly about 175000 keys, which should be searchable with
substrings which not necessarily start at the beginning of the
original key. That is: Key> Abrakadabra Search> kada
I will see, what road to go...
Best regards,
mcc