Being able to group product descriptions in clusters of “similar” descriptions can significantly accelerate the data refinement process.
The challenge is to group the product descriptions into distinct hierarchical clusters of similar products, automatically and without human intervention.
The role of the Clusterer is it to group product descriptions, with a certain level of similarity, into distinct, hierarchical clusters. Organizing the product descriptions in clusters enables us to design the taxonomy, to apply editing activities on a group and to assign relevant clusters to the appropriate subject domain experts for to continue with the refinement process.
The preferred way to use the Clusterer is after formalization but before the rest of the refinement activities. In order to run the Clusterer, the user may set a few parameters and let it continue automatically, no further human intervention is required.
While reviewing the generated clusters, the user can correct common spelling or writing errors within a cluster, to assign it to the relevant subject domain experts, or use it as an initial taxonomy design.
The technology that was implemented in the Clusterer development is mainly based on “minimum entropy” algorithms.
There are several possible deliverables when using the Clusterer, the first and most trivial are the clusters themselves - hierarchical distinctive groups of product descriptions that share similarity, an initial taxonomy structure and a list of distinctive keywords and their statistical characteristics.
The technology that was implemented in the Clusterer development is mainly based on “minimum entropy” algorithms.
There are several possible deliverables when using the Clusterer, the first and most trivial are the clusters themselves - hierarchical distinctive groups of product descriptions that share similarity, an initial taxonomy structure and a list of distinctive keywords and their statistical characteristics.
|