Traditional methods of summarization were very costly and time-consuming. This led to the emergence of automatic methods for text summarization. Extractive summarization is an automatic method for generating summary by identifying the most important sentences of a text. In this paper, two innovative approaches are presented for summarizing the Persian texts. In these methods, using a combination of deep learning and statistical methods, we cluster the concepts of the text and, based on the importance of the concepts in each sentence, we derive the sentences that have the most conceptual burden. In the first unsupervised method, without using any hand-crafted features, we achieved state-of-the-art results on the Pasokh single-document corpus as compared to the best supervised Persian methods. In order to have a better understanding of the results, we have evaluated the human summaries generated by the contributing authors of the Pasokh corpus as a measure of the success rate of the proposed methods. In terms of recall, these have achieved favorable results. In the second method, by giving the coefficient of title effect and its increase, the average ROUGE-2 values increased to 0.4% on the Pasokh single-document corpus compared to the first method and the average ROUGE-1 values increased to 3% on the Khabir news corpus.
Traditional methods of summarization are not cost-effective and possible today. Extractive summarization is a process that helps to extract the most important sentences from a text automatically, and generates a short informative summary. In this work, we propose a novel unsupervised method to summarize Persian texts. The proposed method adopt a hybrid approach that clusters the concepts of the text using deep learning and traditional statistical methods. First we produce a word embedding based on Hamshahri2 corpus and a dictionary of word frequencies. Then the proposed algorithm extracts the keywords of the document, clusters its concepts, and finally ranks the sentences to produce the summary. We evaluated the proposed method on Pasokh single-document corpus using the ROUGE evaluation measure. Without using any hand-crafted features, our proposed method achieves better results than the state-of-the-art related work results. We compared our unsupervised method with the best supervised Persian methods and we achieved an overall improvement of ROUGE-2 recall score of 7.5%.
In this paper, we propose an unsupervised method for summarizing Farsi texts based on our neural named entity recognition (NER) system. This method consists of three phases: training a supervised NER model, recognizing named entities of the text, and generating a summary. The proposed method is an unsupervised extractive single-document summarization method. Although the proposed method is language independent, we focus on Farsi text summarization in this work. Firstly, we produce a word embedding based on Hamshahri2 corpus. Secondly, we train a neural network on Arman NER corpus. Then, the proposed algorithm ranks the sentences of the text based on the named entities in each sentence and produces the summary. Finally, the proposed method is evaluated on Pasokh single-document data set using the ROUGE evaluation measure. Without using any handcrafted features, our proposed method achieves state-of-the-art results. We compared our unsupervised method with the best supervised Farsi methods, and we achieved an overall improvement of ROUGE-2 recall score of 10.2% .