Spring 2004

MATH710 and CMSC691 Introduction to Computational Information Retrieval


Large collections of text documents are now increasingly common and available. Mining such data sets is a major contemporary challenge. An approach to the problem is to transform the set of documents into vectors in a finite dimensional Euclidean space and to deal with vectors rather that texts. The course will focus on vector space models, and linear algebra and clustering techniques for handling large data sets with a limited amount of resources (e.g. memory and cpu cycles).
Potential topics to be covered