GammaWare News Edition

Product Overview

Why Search is Not Enough

What Can it Do for You?

GammaWare Modules

IPTC Support

Taxonomy Lifecycle Management

Transparent Categorization

Differentiating Factors

Integration


GammaWare Differentiating Factors

More Accurate

GammaWare offers the highest precision and recall in the automatic categorization market today. Precision is the percentage of documents placed in a category, which actually belong to that category. Recall, or “coverage,” is the percentage of documents in the entire repository, which belong to a given category, that were classified correctly by the software. In other words, GammaWare is not only best at getting classification right – it also misses less documents when building a category. In benchmark tests checking both precision and recall, GammaWare attained significantly higher scores than competing products.

These quality results are made possible by GammaSite's state of the art, patent-pending machine-learning algorithms, developed by GammaSite's outstanding scientific team. The team includes some of the world's leading machine learning researchers, from the Technion (Israeli Institute of Technology), the University of Toronto, the Hebrew University and the University of Tel Aviv.

Back to top

Faster to implement and use

Due to its sophisticated machine-learning approach, GammaWare can achieve very accurate categorization with minimum set-up. GammaWare requires around five example documents per category, compared to twenty or more for other solutions in its class. This makes GammaWare significantly less labor-intensive than competing products. Furthermore, unique features of the software, including the Category Suggestion Tool, reduce the time needed to manage categorization on a daily basis.

Back to top

Not a "black box"

GammaWare's interfaces allow a human editor to see the factors affecting each classification decision, provide input and make corrections. This reflects GammaSite’s larger design goal – to combine the best software reasoning with human editorial insight. Many competing solutions function as "black boxes" that take in instructions and spew out categorization results.

Back to top

Uses Existing Security Infrastructure

The GammaWare server never stores any of your content - it simply receives requests, encoded in XML, and sends categorization results. This means GammaWare poses no security risk, and you are free to continue using your existing security configuration.

In contrast, many categorization solutions store categorized documents, protecting them with a proprietary security scheme. This approach provides questionable protective measures for your sensitive material, and makes integration more complex.

Back to top

Language Independent

Because it is based on Statistical Machine Learning, GammaWare ranks and compares words without using a predefined dictionary or grammatical rules. This makes the software completely language-independent. The addition of Unicode support strengthens this inherent multilingual ability, allowing GammaWare to recognize any complex character set - from English to Arabic to Chinese.

Back to top

Solves the Hierarchical Recall Problem

GammaWare is the only solution that circumvents the Hierarchical Recall Problem, which can dramatically reduce the accuracy of automatic categorization. The problem stems from the fact that in general, categorization software attempts to filter documents down the taxonomy tree, matching them to categories one level at a time.

To explain the problem, we'll take a hierarchical tree of categories with three levels. On the first level is the category "Europe," on the second level is "Scandinavia," and on the third level is "Sweden." If we assume the categorization system has a recall rate of 90%, 9 of every 10 documents about Sweden should be categorized into "Europe / Scandinavia / Sweden." In practice, the recall rate will be substantially lower.

The software starts by checking which documents belong in the "Europe" category. Due to the high recall rate, 9 of 10 documents about Sweden are matched to the category "Europe." The software now checks which of these documents belong in the sub-category "Scandinavia" - but one document about Sweden has already been discarded. And so, even if the high recall is maintained, only 8.1 (9 X 90%) of the documents about Sweden will be classified into "Scandinavia."

On the third level, the problem gets even worse, because two of the ten documents about Sweden were discarded in error. The software checks if the remaining eight documents belong in the category "Sweden." Maintaining the same recall rate, only 7.3 (8.1 X 90%) of the original documents about Sweden will be categorized correctly, meaning the recall rate in practice is a much lower 73%.

GammaWare solves the hierarchical recall problem by predicting the highest probability for a document to belong to a relevant sub-category, using specially-developed statistical algorithms. Documents belonging to categories deep within the tree are classified without filtering down the levels. The result is dramatically improved recall, which makes for better categorization results.

Back to top