As of January, 2012, this site is no longer being updated, due to work and health issues

Search Tools Analysis

Google Custom Search Engine (CSE) 2007

InfoToday NewsBreak
by Avi Rappoport
July 30, 2007
(some rights reserved)

Google has launched the Google Custom Search Business Edition, a search service designed for small-business Web sites. Using the Google.com search index, this service is available for as little as $100 per year. Simple enough for any site owner, this hosted search builds on the free Custom Search Engine (CSE), which has been available since fall 2006. The Business Edition does not show advertising, allows sites to opt out of the Google branding requirement, provides email technical support, and offers a powerful XML API (application programmer interface) for queries and results, but search is limited to pages indexed by the Google.com Web search engine.

Site search is a valuable tool for sites with more than a few dozen pages—it’s an alternate form of navigation, allowing site visitors to avoid deep drill downs and confusing labels, incorporating new vocabulary as soon as the content is posted and the index updated. "Millions of businesses have a Web presence but offer users no ability to search their site," said Dave Girouard, vice president and general manager of Google Enterprise. "While many of these businesses invest in search advertising and search engine optimization to help customers find their business, customers are left on their own to navigate content once they land on a site."

With remote hosted search (Software as a Service, or SaaS), sites can add a search field to each page; when a user types a query, the form sends it to the search service, which retrieves matching pages from that site stored in the index and returns a results list to the site. Site visitors may never notice that the search results come from somewhere else.

The main example site mentioned, HolidayHomeRental, proves how a full-text search engine, such as the Google CSBE, can supplement a more powerful but difficult forms-based search interface. Customers do not have to guess which fields are populated and available for searching; they can simply enter a location or other criteria. Andy Steggles, president of the site, reports that a week after the Google CSBE search went live, the number of referrals increased by 30 percent and the customer service requests significantly decreased.

The CSBE takes advantage of the features of the Google.com Webwide search engine; the crawler traverses many kinds of links, including iframes, most image maps, and even some JavaScript links, although it cannot index content generated by JavaScript. It can index the most popular file formats: HTML, XML, text, PostScript, RTF, PDF, Lotus, MacWrite, MS Word, Excel, and PowerPoint. Extensive character set and language recognition allow site visitors to find relevant documents in any language. The service uses Google’s successful spell-checking function, retrieves matches from within the Google index (with optional Safe Search enabling for eight languages), hides duplicate pages, and ranks using the same relevance algorithms perfected by the Web search service. Scaling and reliability should never be a problem, because Google has proven its expertise in these areas.

As with other hosted services, such as Visual Science Search (formerly Atomz), FreeFind, FusionBot, Webinator, SLI Search, Spiderline, and Blossom, there is no need to install software or hardware. Because the CSBE uses the Google.com index, it requires no additional bandwidth for indexing and just a small amount for serving search results. Google has provided hosted search services for several years under various names, and in testing it’s always been responsive and reliable.

The Total Training Network site has implemented the CSBE very easily, according to Mike Begin, network and system administrator. "It was really easy to set up and it took us ten minutes to do so." He notes that the site had been using an earlier version of the Google service.

All administration is done through the CSBE Web interface. Search administrators can specify what the search will cover, with include and exclude listings for sites, hosts, and even individual URL strings, using wildcards. Other options include "refinements"—filters based on query terms and URL patterns that can limit search to a specific directory or site—and "subscriptions" (Search Suggestions or Best Bets), which appear at the top of search results for specified terms.

The interactive look-and-feel customization is limited to customizing a site logo and text colors and removing the Google logo. There’s no template to place site design and navigation elements into the results page, although the service can be called via a JavaScript link and shown in an iframe. Those with technical resources can use a special Google Search API to request XML results, up to 20 at a time. The XML can then be incorporated into a presentation layer, such as Perl, PHP, Ruby on Rails, or Java. In this case, there will be no change to the URL and no indication that the results are served by any other site.

The site Justia.com has implemented the CSBE for more than 1 million pages, according to CEO Tim Stanley. They had tried many open source search engines but found the Google service to provide better relevance and the combination of their own Perl coding and the CSBE XML to be very easy to implement.

There are some limits to this service. Like the Google.com search engine, there is no way to index pages protected by passwords, cookies, session IDs, or other access control. Google explicitly will not guarantee that it will crawl all of the pages of a particular site, there is no way to update the index on demand (or even frequently), and the company has stated that using this service will not improve a site’s position in the Google.com search results in any way. Many of the other hosted services do not have these update limits, as they create indexes specifically for sites to be searched and provide more control over the index schedule.

The CSBE does not support showing advertising on results pages. If advertising is desired, sites can connect their AdSense accounts to the free Google Custom Search Engine and profit from advertising clicks in that manner.

Reports are limited to listings of the most frequent queries and the number of queries per time period (day/week/month/overall). From all indications, this does not include any information about the queries that found no matches in the site—a significant omission. The search can be incorporated into the Google Activity Monitor site traffic analysis tool for a more complete picture of site visitor behavior and conversion tracking.

The documentation is helpful, but some of the features, such as subscriptions and iframes, are not currently covered well. CSBE customers, unlike customers of the free CSE, have access to individual email support. Larger sites purchasing via direct sales can also access telephone support.

Pricing for the CSBE is $100 per year for up to 5,000 pages and $500 per year for up to 50,000 pages (both payable by credit card via Google Checkout). Potential customers with larger sites should contact the company; yearly cost starts at $15,000 for 50,000 to 1 million pages. Nonprofits, universities, and government agencies can use the free CSE and opt out of advertising. For more control over the crawling and indexing of password-protected pages, Google offers hardware/software bundles—the Google Search Appliance and Mini Appliance.

 

see also:

Page Modified 2007-07-30