One of the more important but at the same time one of the most difficult tasks, is to assign the right category to a product description - to classify product descriptions correctly to a given taxonomy hierarchy.
The ability to classify a batch of product data, automatically, consistently, correctly and efficiently, in any format, structure or language is a major challenge.
The roll of the Classifier is to auto-classify any product description, in any language or mixed languages, in any structure or format to a given taxonomy hierarchy, efficiently, consistently and accurately.
We use the Classifier before the data extraction and de-duplication, preferably after the formalization. The classifying process starts by auto-classifying the data based on existing Knowledge Bases, followed by an interactive session where subject domain experts "train" (teach) the system by providing a few examples, then allowing the Classifier to use these examples and automatically classify the product data, then again, providing a few more examples from the un-classified products, until all required products are classified.
The technology behind the Classifier is "Machine Learning by Examples", adapted to handle unstructured data with no semantics.
There two deliverables when using the Classifier. The first deliverable is product descriptions classified by a given taxonomy hierarchy; the second is a collection of example, organized as a knowledge base to be used in the future to auto-classify
other products.
|