The system is made to check the spellings and to correct them using various techniques for Punjabi-Hindi text. They used hybrid approach to implement the Spelling checking and Correcting System. This hybrid approach is a combination of
“Dictionary look up approach”, “
Rule based approach”, “
N-Gram Approach”, “
Edit Distance approach” and use linguistic features of the Punjabi-Hindi language.
Dictionary lookup approach
- In this approach each word in the paragraph which will be given as an input is checked for the database entries. If the scanned word is found in the database then is considered to be correct word, but in case the word is not present in the database table then it is considered as an incorrect word.
- After finding the word incorrect various handcrafted rules are applied to generate the correct spellings of the word by considering the linguistic features of the Punjabi-Hindi language, if approach generate the multiple entries for the single entry then by using statistical analysis a more appropriate word id chosen by the system and is replaced with the incorrect word to generate the result.
Edit Distance
- This method is based on the statement that the person usually makes some mistakes if one, so therefore for each dictionary word the minimum number of the fundamental editing operations (insertion, deletions, substitutions) required to convert a dictionary word in to the non-word .the lower, the number ,the higher the probability that the user has made such errors.
- Through the operation of adding, deleting and modifying, Edit-Distance changes a word into the minimum operating frequency of another word.
Rule based Approach
- In this approach handcrafted rules are made by considering the features of the Punjabi-Hindi language.
- These rules are applied on the words in the paragraph which are not found in the database.
- By the help of these rules the system attempts to generate the exact spellings of the word which is under observation.
N-Gram Analysis
- This works when rule based approach fails to generate the appropriate word for the incorrect words.
- In this approach system try to find the accurate word by considering its neighbor words by comparing with the existing paragraph stored in the system.
- This method also helps to identify the correct word when more than two words are generated by the rule based approach.
No comments:
Post a Comment