Sunday, May 29, 2016

Contextual spell checking for Tamil Language

This paper proposes a contextual spell correction feature to an existing spell checker that is developed using a hybrid approach by the authors. The existing spell checker is integrating the dictionary check, Canti check, crowd sourcing and suggestion generation. In addition to that a contextual spelling correction approach in which, mistakes that arisedue to confusion in following sets of letters {Ḻakaram, ḷakaram, lakaram}, {ṇakaram, nakaram, ṉakaram} and {ṟakaram, rakaram} is proposed in this paper. A bigram languagemodel is used to make suggestions for the words with confusing letters.

Hybrid approach for Tamil spell checking

An application to check the Tamil spelling using a hybrid approach has been proposed and
developed  This approach integrates dictionary approach, canticheck and crowd sourcing for new words. The Levenshtein distance finding algorithm is used to match the words with the dictionary and flag the misspelled words.
 
Grammatical rules have been written for Valliṉammikum and Valliṉammikā places to solve
the cantiproblems. For generating suggestions, an n-gram based technique is used. Further
a feature, called crowd sourcing, has been added to the system to collect new words from
users. This is an important feature as there are a lot of colloquial words are used in Tamil.
 



 



 The given word is first checked with dictionary to see whether the word exists on the dictionary using Levenshtein algorithm. If it is not available, then the appropriate suggestions will be generated using letter level n-gram analysis. Then it is checked for joining letter, the cantiletter. After the completion of these steps, word is checked to see whether there are any letter in confusion set
are available in it. If such letter is available, then the letter is checked for appropriateness. This is done using a statistical approach with the aid of a bi-gram language model.

No comments:

Post a Comment