GENERATION OF A SET OF KEY TERMS CHARACTERISING TEXT DOCUMENTS

Kristina Machova; Andrea Szaboova; Peter Bednar

Authors

Kristina Machova
Andrea Szaboova
Peter Bednar

Keywords:

text documents, key terms generation, TF-IDF method, information gain, mutual information, term relation

Abstract

The presented paper describes statistical methods (information gain, mutual X^2 statistics, and TF-IDF method) for key words generation from a text document collection. These key words should characterise the content of text documents and can be used to retrieve relevant documents from a document collection. Term relations were detected on the base of conditional probability of term occurrences. The focus is on the detection of those words, which occur together very often. Thus, key words, which consist from two terms were generated additionally. Several tests were carried out using the 20 News Groups collection of text documents.

Author Biographies

Kristina Machova

Tehnical University, Košice

Andrea Szaboova

Tehnical University, Košice

Peter Bednar

Tehnical University, Košice

GENERATION OF A SET OF KEY TERMS CHARACTERISING TEXT DOCUMENTS

Authors

Keywords:

Abstract

Author Biographies

Kristina Machova

Andrea Szaboova

Peter Bednar

Downloads

How to Cite

Issue

Section

Make a submission

SJR Score

Ranked

doaj-crossref-hrcak

Referenced

scopus

esci

plagiarism-access

Trustworthy

Access