Repozytorium PJATK

Clustering and summarization of news articles

Repozytorium Centrum Otwartej Nauki

Pokaż uproszczony rekord

dc.contributor.author Darnowski, Mateusz
dc.date.accessioned 2023-02-23T13:40:36Z
dc.date.available 2023-02-23T13:40:36Z
dc.date.issued 2023-02-23
dc.identifier.issn 2022/I/D/20
dc.identifier.uri https://repin.pjwstk.edu.pl/xmlui/handle/186319/2532
dc.description.abstract The proposed solution often fails to identify specific important events effectively; however, the model accurately describes the internal debate conducted in the managed articles set. Newspapers contain arguments, support contracting philosophies. The model-generated summaries cite and copy common perspectives disseminated in documents. The universality of quoted opinions could be only evaluated by extending the processed dataset, adding rec-ords from more diverse data sources. Model behaves correctly when reviewing a small set of related papers and identifies unique sub-topics appearing in a dataset. Topic-summaries distinguishably differ in content and contexts, even if the topics discovered by the model share similar key-words. Increasing the number of articles to be summarized results in creation of more diverse topics; nevertheless, discovered summaries include more quotes and opinions. Complex and lengthy sentences heavily influence the results of similarity measurements and document sorting. Model favors long sentences with varied vocabulary that could be easily linked to numerus discussions, which can hurt the positioning of short, condensed sentences that should be valued in a summary. Sentiment analysis study and reconsidering the assumptions of the model could help in pro-ducing more condensed and less opinion-focused summaries. However, the presented solu-tion successfully captures and outlines the background of conversations conducted in ana-lyzed reports, introduces arguments that could be omitted by excluding long sentences and opinions from processing and summarizations. The model provides coherent and compelling topics and offers interpretations relevant to the public discourse. The usage of Gibbs sampling effectively improves the accuracy of LDA topic-clusters labelings. Proposed paragraphs extraction method successfully removes noise and simplifies topic modeling assignments. Proposed in 6.3.4. Ranking paragraphs on a top-ic approach outperforms Text Rank considerably when handling a paragraphs set; sentences added to summaries are more coherently linked and describe the issues identified in corpus more clearly. The in-depth study and reuse of LDA likelihood estimates provides convincing overviews of the discussions initiated in the analyzed datasets. pl_PL
dc.language.iso en pl_PL
dc.relation.ispartofseries ;Nr 7087
dc.subject Informatyka pl_PL
dc.subject Systemy inteligentne pl_PL
dc.title Clustering and summarization of news articles pl_PL
dc.type Thesis pl_PL


Pliki tej pozycji

Plik Rozmiar Format Przeglądanie

Nie ma plików powiązanych z tą pozycją.

Pozycja umieszczona jest w następujących kolekcjach

Pokaż uproszczony rekord

Szukaj


Szukanie zaawansowane

Przeglądaj

Moje konto