Popis: |
Brz razvoj interneta omogucio nam je da izražavamo i dijelimo svoja mišljenja na dosad ´ nevidenoj skali, pri ¯ cemu je tempo života ljudi postao sve intenzivniji. Takvi su uvjeti prisilili ˇ ljude da svoje odluke temelje tek na manjem uzorku podataka, unatoc iscrpnoj koli ˇ cini dos- ˇ tupnih informacija. Metode sažimanja više dokumenata (MDS) mogle bi pomoci pri rješa- ´ vanju ovog problema. Tema diplomskoga rada su MDS metode, konkretno primjena metoda korištenih u drugim domenama na na recenzije filmova. Prouciti literaturu o MDS metodama ˇ i sažimanju recenzija. Formirati MDS sustav za skup podataka IMDb. Platforma IMDb trenutacno prikazuje samo malen podskup recenzija koje su rangirane kao "najkorisnije" za ˇ neki film, zanemarujuci stotine ili tisu ´ ce drugih dostupnih recenzija. Cilj rada jest generirati ´ koherentnu i korisnu recenziju filma na temelju niza drugih recenzija, koja ce samostalno ´ sadržavati više informacija od ostalih pojedinacnih recenzija i pojednostavniti korisniku od- ˇ luku. Pri izvršavanju ovog zadatka potrebno je primijeniti deskriptivne statisticke metode ˇ kako bi se podaci bolje razumjeli, oblikovati evaluacijske strategije MDS modela te implementirati i trenirati neki od najboljih modela za sažimanje više dokumenata. Provesti iscrpno vrednovanje modela, usporedbu s referentnim modelima, analizu pogreški i statisticku anal- ˇ izu rezultata. The growth of the internet has allowed us to express and share our opinions on an unprecedented scale, while at the same time the modern lifestyle made us the busiest we’ve ever been. This impasse has put many people in position to make decisions based on limited data, even though vast amounts of pertinent data may be available. Multi-document summarization (MDS) methods could help in alleviating this problem. The topic of this thesis are MDS methods for movie reviews, in particular the application of methods traditionally used on texts from other domains to text from the movie review domain. Study the literature on MDS and movie reviews summarization. Devise an MDS system for the IMDB dataset. Currently, IMDb is showing only the "most helpful" review on the front page for each movie, not representing hundreds or thousands of other reviews existing for each movie. Your task is to devise a model capable of producing a coherent and helpful review that is a better representation of a number of other reviews, and one that could prove more useful for the user to decide upon watching the specific movie. In addressing this task, rely on descriptive statistics to gain an understanding of the data, devise an evaluation strategy, implement baseline models, and then develop and train at least one state-of-the-art NLP model commonly used for this problem. Perform a thorough evaluation of the model, a comparison against sensible baselines, as well as a detailed error analysis and statistical analysis of the results. |