• ASCII file parsing and word frequency count
  • building vectors for document collections
  • building vectors for document collections (phase 2)
  • clustering with PDDP
  • classical batch k-means augmented by incremental iterations