Online Handwritten Script Recognition
The goal of topic-oriented text summarization is to produce informative short description according to the given topic or query. This is somewhat similar to the target of question answering which retrieves exact answers from large text collections. In this paper, we present a lightweight and rule-free summarization technique. Our method relies on a two-pass re-ranking framework. The first pass is to order the concepts which were clustered via conventional top-down clustering algorithm. The second pass generates the representative sentences from the top N concepts. The main advantage of our work is that we do not need to build external knowledge or pre-defined rules. This is our first time to participate in DUC. Although the result of our system is not comparable with most top-performed methods, the light-weight and rule free techniques still encourage us to further improve via integrating rich sources.
The existing method deals with languages are identified
- Using projection profiles of words and character shapes.
- Using horizontal projection profiles and looking for the presence or absence of specific shapes in different scripts.
- Existing method deals with only few characteristics
- Most of the method does this in off-line
The proposed method uses the features of connected components to classify six different scripts (Arabic, Chinese, Cyrillic, Devnagari, Japanese, and Roman) and reported a classification accuracy of 88 percent on document pages. There are a few important aspects of online documents that enable us to process them in a fundamentally different way than offline documents. The most important characteristic of online documents is that they capture the temporal sequence of strokes while writing the document. This allows us to analyze the individual strokes and use the additional temporal information for both script identification as well as text recognition.
In the case of online documents, segmentation of foreground from the background is a relatively simple task as the captured data, i.e., the (x; y) coordinates of the locus of the stylus, defines the characters and any other point on the page belongs to the background. We use stroke properties as well as the spatial and temporal information of a collection of strokes to identify the script used in the document. Unfortunately, the temporal information also introduces additional variability to the handwritten characters, which creates large interclass variations of strokes in each of the script classes.
Operating System : Windows XP
Languages : Java 1.6
Tools : Net Beans/Eclipse.
Processor : 600 MHz or above.
RAM (SD/DDR) : 256 MB
Hard Disc : 30GB