During the course of our research on search in the cloud, we’ve been collecting some case studies on uses of SaaS-delivered search. Two of the major reasons that companies give us for moving their search to a cloud-based model are that first, they need a scalable, flexible model that can vary with the demands of the business, and second, that search is not their core business so they prefer to rely on outside experts who can deliver a solid reliable foundation on which they can build specialized applications.
Businesses that are dynamic information exchanges require this kind of scalable reliability. They need to make their information available quickly, and cater to the dynamics of their users. Search is critical to their business. Prezi (prezi.com) is a good example of this kind of use. This cloud-based software company enables its customers to brainstorm and collaborate, create unusual presentations, and share the results, no matter their location or device. Their search needs at this stage are basic—good matching of queries to documents and quick updating of their index. They started with about 200 million documents, but the volume to grow to 1 terabyte, doubling annually. Prezi did not want to hire or develop the expertise to build search from scratch, and they needed flexible, scalable search to match their growing business. Their customers need to find materials both they and others have developed, and they want to find images by topic without the time consuming delays of creating and standardizing tags.
To make its materials searchable quickly and easily, Prezi developed a database of images that are associated with the text in the same slide. The contents change constantly, however, and they need to upload those images and make them searchable automatically using the related text. Furthermore, they anticipate adding and indexing new sources. For this purpose, they envisioned using search as “a materialized view over multiple sources.” In other words, a single gateway to all their information.
To accomplish this, they needed stable, reliable and expandable search. The materials had to be accessible to its users no matter their device or location. Peter Neumark, a Prezi software engineer told us that they were looking for search that they could “pay for, use and forget about.”
Selecting a Search Infrastructure
Prezi’s previous search solution was slow, and didn’t function well enough as a key-value store. They also required a solution that allowed them to relate an image to its neighboring text easily. They decided to look at Amazon’s CloudSearch to solve these problems and deliver relevant material to searchers quickly and reliably. In other words, they were looking for search that “just worked”. They didn’t want to maintain it themselves, and, because they were familiar with them, they wanted to continue to use the AWS API’s, which they like.
When they did head-to-head testing, they found that CloudSearch was cheaper, faster, more reliable and expandable, and easier to synch with their Amazon DynamoDB database. They liked its auto-scaling features that would grow with their data and their business.
Rolling out CloudSearch and Future Plans
Prezi are “happy campers”. They deployed CloudSearch in 3 weeks, and are seeing lower cost, lower latencies, and virtually no need to pay attention to their basic search foundation. Their next step will be to roll out additional domains and sources. They like the idea of adding domains rather than changing the initial schema. They will also make the search function more visible on their site, now that they no longer need to worry about its reliability and speed.
The open source software movement raises difficult questions for CIO’s:
- Is open source software “free”?
- If not, what are its costs and risks?
- Does using open source software save time in deploying an application?
- What uses are best suited to open source software?
The answer to all of these questions is, unfortunately, “it depends”. Using open source software effectively depends on the type of application and on the expertise of the developers. It also requires the same kinds of trade-offs that are necessitated by any choice of software: how customized does it have to be? How accurate? How scalable? How usable and for which types of users? This is particularly true in the realm of search and text analytics because both of these applications are language dependent, with all the nuances, variety and complexity that language brings.
We find widespread use of open source components by commercial software vendors. They use open source search or text analytics as a starting point. Then they add in the vocabularies, domain knowledge, tools and widgets, connectors to other applications and information stores, process knowledge and user interaction design to create usable and scalable applications that are suited to a specific purpose. We also find sophisticated enterprises with enough skilled developers, computational linguists, and interaction designers using open source software to give them the custom applications they need. There is no doubt that as open source applications have become more robust and the tools to use them have become available that they are an attractive alternative for many enterprises. But are they “free?” Not if you consider the time, labor and expertise needed to make them an integral, useful part of the enterprise stack.
I’ll be chairing a one-day program on open source search software on Nov. 6th in Chantilly, VA, near Washington, DC that will discuss these questions. We’ve invited some major open source search developers from Elastic Search, Sphinx, Lucene, Solr, as well as vendors who have embedded open source software in their products. Practitioners will discuss their experience with developing applications using open source as well. Eric Brown, Director of Research for IBM Watson, which embeds multiple open source products, will give the keynote, and Donna Harman from NIST’s TREC will discuss how to evaluate search effectiveness. Government employees can register for the event free. Others will get a discount on the registration fee by entering feldman2013 when they register.
In addition, we are collecting data on use of both commercial and open source search and text analytics and are hoping that you will fill in our survey. Results will be tabulated, and all respondents will receive a summary of what we find. You can find the survey at: https://www.surveymonkey.com/s/Synthexis
I hope to see you in November.