Popis: |
With more and more advanced data analysis techniques emerging, people will expect these techniques to be applied in more complex tasks and solve problems in our daily lives. Text Summarization is one of famous applications in Natural Language Processing (NLP) field. It aims to automatically generate summary with important information based on a given context, which is important when you have to deal with piles of documents. Summarization techniques can help capture key points in a short time and bring convenience in works. One of applicable situation is meeting summarization, especially for important meeting that tend to be long, complicated, multi-topic and multi-person. Therefore, when people want to review specific content from a meeting, it will be hard and time-consuming to find the related spans in the meeting transcript. However, most of previous works focus on doing summarization for newsletters, scientific articles...etc, which have a clear document structure and an official format. For the documents with complex structure like transcripts, we think those works are not quite suitable for meeting summarization. Besides, the consistency of summary is another issue common to be discussed in NLP field. To conquer challenges of meeting summarization, we are inspired by "QMSum: A New Benchmark for Query-based Multi-domain Meeting Summarization" proposed by Microsoft and we also propose our Locater model designed to extract relevant spans based on given transcript and query, which are then summarized by Summarizer model. Furthermore, we perform a comparative study by applying different word embedding techniques to improve summary consistency. |