home
technology
news
case studies
about inquera
contact us
 

Get
Started
Now

elevate suite factory extractor
 
Challenge:


Role:

Use:

Technology:

Deliverables:
transforming, automatically and properly, unstructured,
multi-lingual product descriptions into a normalized
table of attributes and values
to represent product data by table of normalized attribute-values instead of free text
after classification, iterations of auto-extraction followed
by training the system with new manual examples
fuzzy-match, pattern recognition, Induction Logic Programming, machine learning
product data in a table structure, knowledge base of examples, lexicons

A major purpose is to transform the textual product description, within each category, into a table of normalized attributes and values that reflects the product's characteristics in a clear, consistent and unambiguous manner. We love to call it "the single version of the truth".
The challenge is to take classified product descriptions, in any language, dialect, or structure, that includes synonyms, abbreviations, spelling mistakes, different unit of measurement systems, etc., and to transform it, automatically and accurately, into a normalized table of appropriate technical attributes and values.

The roll of the Extractor is to automatically transform any classified product description into a normalized table of attribute and values, efficiently and properly.

We use the Extractor after the classification. The extracting process starts by auto-extraction of the data using existing Knowledge Bases. This is followed by an interactive session where subject domain experts "train" (teach) the system by extracting one product description manually, then allowing the Extractor to use these example and automatically extract values from other product descriptions. The above session repeated by providing more manual examples from the un-extracted products, until all required products are properly extracted.

The technology behind the Extractor consists of methods such as fuzzy-match, pattern - recognition, Induction Logic Programming (ILP) and "Machine Learning by Examples", adapted to handle unstructured data with no semantics.

There are several deliverables when using the Extractor. The first deliverable is a table of normalized technical attributes and values that describe the products in an unambiguous way; the second is a collection of pattern examples, organized as a knowledge base, to be used in the future to auto-extract other products; another deliverable are lexicons that link raw data terms, in a context of a category, to a proper standard lexical term.

 

                         FACTORY Tools
FORMALIZER
Product
descriptions
written in a
variety of...
CLUSTERER
Being able
to group
product
descriptions...
CLASSIFIER
One of the
more important
but at the
same time...
 
 
EXTRACTOR
A major
purpose is to
transform the
textual...
DEDUPER
Having all
product data
organized by
classified...
DESCRIPTOR
Most IT
systems cannot
hold product
data in...
 
 
                FACTORY Supporting Tools
Taxonomy
Manager
Lexicon
Manager
Source
Manager
Export
Manager