Category Archives: Text Analytics
With its acquisition last week of AlchemyAPI, IBM’s Watson Group added new, tools and expertise to its already-rich and growing array. Alchemy API’s technology complements and expands the core IBM Watson features. It collects and organizes information with little preparation, making it a quick on-ramp for building a collection of information that is sorted and searchable. It works across subject domains, and doesn’t require the domain expertise that the original Watson required. Its unsupervised deep learning architecture is designed to extract order from large collections of information, including text and images, across domains.
In contrast, the original Watson tools used to understand, organize and analyze information demands some subject expertise. For best results, experts are required to build ontologies and rules for extracting facts, relationships and entities from text. The result is a mind-boggling capability to hypothesize, answer questions, and find relationships, but it takes time to build and is specific to a particular domain. That is both good and bad, because they provide a depth of understanding, but at a significant cost in terms of time to get up and running. The Watson tools are also text-centered, although significant strides have been made to add structured information as well as images and other forms of rich media.
AlchemyAPI was designed to solve precisely these problems. It creates a graph of entities – and the relationships among them, with no prior expectations for how this graph will be structured. It is entirely dependent on what information is in the collection. Again, this is both good and bad. Without subject expertise, topics that are not strongly represented in the collection may be missing or get short shrift. Both approaches have their limits, as well as their advantages. Experts add a level of topic understanding—of expectations—of what might be required to round out a topic. Machines don’t. But machines often uncover relationships, causes and effects, or correlations that humans might not expect. Finding surprises is one of the strongest arguments for investing in big data and cognitive computing.
In this acquisition, Watson continues the path that helped it win Jeopardy!—by combining every possible tool and approach that might increase understanding. IBM can now incorporate multiple categorizers, multiple schemas, multiple sources, and multiple views and then compare the results by the strength of their evidence. This gives us more varied and rich results since each technology contributes something new and crucial. Like the best human analysts, the system collects evidence, sorts through it, weighs it, and comes to more nuanced conclusions.
The Watson platform adds a major piece to information systems that is often unsung. It orchestrates the contributions of the technologies so that they support, balance and inform each other. It feeds back answers, errors, and user interactions to the system so that Watson learns and evolves, as a human would. In this, it removes some of the maddening stodginess of traditional search systems that give us the same answers no matter what we have learned. In seeking answers to complex, human problems, we need to find right answers, perhaps some wrong answers to sharpen our understanding, and certainly the surprises that lurk within large collections. We want a system that evolves and learns, not one that rests on the laurels of a static, often outdated ontology.
Mirroring this technology architecture, the IBM’s Watson Group similarly requires a group of closely knit, strong minded people who are experts in their separate areas of language understanding, system architecture, voting algorithms, user interaction, probability, logic, game theory, etc. Alchemy contributes its staff of deep learning experts, who are expected to join the Watson Group. It also brings its 40,000 developers worldwide, who will broaden the reach and speed the adoption of cognitive computing.
Big Data and Cognitive Computing: The Next Industrial Revolution? updates the trends we covered in The Answer Machine, published by Morgan & Claypool last year. This webcast on Jan. 30, 2014 was given to the Cornell Entrepreneur Network, but was open to all. You can listen to the recording at https://cornell.webex.com/cornell/lsr.php?RCID=616468230cc9b30a45ddd07d778325e2.
In updating the book, we found that the nascent trends we discussed in 2012 have quickly exploded. Applications that aggregate information and integrate technologies are becoming common. Task-centered design is almost a requirement. The market, driven by the buzz around big data, and bombarded by information has started to demand what vendors foresaw: there’s immense value in putting together the pieces from disparate sources, and we need help in doing this. IBM’s Watson may have been the first to define cognitive computing, but we see others positioning themselves in this marketplace as the interest grows. We’ll be covering some of these new companies in the months ahead.
During the past year, as we work with vendors and technology buyers, we have found that one of the most difficult concepts to get across is probabilistic computing. Where does it fit in the current IT landscape? Does it replace traditional BI? We expect to explore this topic also in the coming months. Please contact me directly if you’d like to discuss it in depth, or to schedule a briefing for your company. I can be reached at firstname.lastname@example.org.