[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: folding double vowels?
- From: Petite Abeille <petite.abeille@...>
- Date: Wed, 22 May 2013 21:57:48 +0200
On May 22, 2013, at 8:47 PM, Geoff Leyland <geoff_leyland@fastmail.fm> wrote:
> This is only a slight improvement, but it seems to work.
>
>> print( ( 'aaaa aaa aa a aaaaa' ):gsub( '([aeiou])[aeiou]', '%1' ))
Perfect. Combined with ASCII transliteration, and, voilà, a very simple word normalization of sort:
print( 1, Unidecode( 'blåbærsyltetøj' ):lower():gsub( '([aeiou])[aeiou]', '%1' ) )
print( 2, Unidecode( 'blåbärsyltetöj' ):lower():gsub( '([aeiou])[aeiou]', '%1' ) )
print( 3, Unidecode( 'blaabaarsyltetoej' ):lower():gsub( '([aeiou])[aeiou]', '%1' ) )
print( 4, Unidecode( 'blabarsyltetoj' ):lower():gsub( '([aeiou])[aeiou]', '%1' ) )
print( 5, Unidecode( 'Räksmörgås' ):lower():gsub( '([aeiou])[aeiou]', '%1' ) )
print( 6, Unidecode( 'Göteborg' ):lower():gsub( '([aeiou])[aeiou]', '%1' ) )
print( 7, Unidecode( 'Gøteborg' ):lower():gsub( '([aeiou])[aeiou]', '%1' ) )
print( 8, Unidecode( 'Über' ):lower():gsub( '([aeiou])[aeiou]', '%1' ) )
print( 9, Unidecode( 'ueber' ):lower():gsub( '([aeiou])[aeiou]', '%1' ) )
print( 10, Unidecode( 'uber' ):lower():gsub( '([aeiou])[aeiou]', '%1' ) )
print( 11, Unidecode( 'uuber' ):lower():gsub( '([aeiou])[aeiou]', '%1' ) )
1 blabarsyltetoj 1
2 blabarsyltetoj 0
3 blabarsyltetoj 3
4 blabarsyltetoj 0
5 raksmorgas 0
6 goteborg 0
7 goteborg 0
8 uber 0
9 uber 1
10 uber 0
11 uber 1
Examples curtesy of https://issues.apache.org/jira/browse/LUCENE-5013