Hybrid approach for Tamil spell checking
An application to check the Tamil spelling using a hybrid approach has been proposed anddeveloped This approach integrates dictionary approach, canticheck and crowd sourcing for new words. The Levenshtein distance finding algorithm is used to match the words with the dictionary and flag the misspelled words.
Grammatical rules have been written for Valliṉammikum and Valliṉammikā places to solve
the cantiproblems. For generating suggestions, an n-gram based technique is used. Further
a feature, called crowd sourcing, has been added to the system to collect new words from
users. This is an important feature as there are a lot of colloquial words are used in Tamil.
The given word is first checked with dictionary to see whether the word exists on the dictionary using Levenshtein algorithm. If it is not available, then the appropriate suggestions will be generated using letter level n-gram analysis. Then it is checked for joining letter, the cantiletter. After the completion of these steps, word is checked to see whether there are any letter in confusion set
are available in it. If such letter is available, then the letter is checked for appropriateness. This is done using a statistical approach with the aid of a bi-gram language model.