Is Hosted Search Really Ready for Prime Time?

2737880191 b5465e18ee m Is Hosted Search Really Ready for Prime Time?In my years that I’ve now spent in higher education, one universal truth I have found is that nothing quite moves a project along like when someone much more important and much less web savvy than you deems an issue worth addressing.  Such was the case only a couple months after I had started at the university, when the Director of Marketing noticed that new information she had put up on the site wasn’t coming up in search results, and the results that were hitting weren’t particularly relevant to the topic in the first place.  Thus, a mission was born, to find a way to make our search better, and to do it NOW.  That’s the other thing about people higher up than you, when they say jump, generally you jump.

At the time (approximately three years ago), we had been using the pretty straight forward Google search for web sites.  It amounted to putting a box on your page that submitted to Google, restricting results to your domain.  You couldn’t really do anything else with it then besides add a banner to the top.  So began the odyssey.  Most of the major players all offered a basic site search back then, all of which fairly equally crippled.  The Google Search Appliance was (and still is) crazy expensive and totally overkill for our site.  The IBM/Yahoo product OmniFind was still a few months from launch (nor did we have hardware to run it on at the time).  The Thunderstone Parametric Search Appliance just looked a little… well, no one I know had ever heard of them, and their website wasn’t (and still isn’t) something that inspires my confidence.  The Mini, on the other hand, was cheap, more than adequate for our site size, and was getting good reviews.  Not to mention that the money to get it was ready, willing, and able.  All that made the choice pretty easy for us, so we dove in.

Now, fast forward a couple years.  We are still using our Mini.  In fact, I just upgraded to 5.0.4 on Monday.  I’ve never had a lick of problem with it and became a pretty quick fan of it.  This year at eduWeb I had the good fortune to share my experience with a couple people, and the conversation generally drifted towards: “Why is that better than Google Site Search?”  Originally, the Mini offered a ton of unique features, such as custom collections, theming, or the ability to export results as XML.  The past year has seen a growth in the availability and features provided by free, hosted search solutions.  Yahoo BOSS looks to be an API that wants to take a serious swing at the hosted search crown.  Google’s Custom Search Business Edition (CSBE) AKA Google Site Search is also offering business and schools the opportunity for search with many of the features of the Mini like ability to remove branding and ads and call results as XML (note: Google Site Search is free for universities).

With all these new options, is the Mini even a worthwhile investment now?  We’re coming up on the end of our support term, so I figured this was a prime time to evaluate the field.  My short answer is: Yes, it still is.  My long answer also happens to be yes.  See, search is important. Search is doubly important for universities because we have so much crap out there, and so many different topics to address (many of which also happen to be crap, but you can’t tell that to the people putting it out there).  A Mini now costs $3000 with 2 years of support, which would be equal to six years of equivalent CSBE service (assuming you had to pay) which prices out at $500 a year for 50,000 pages.  Obviously Google isn’t trying to mothball its own products, so where does the Mini make up that cost?

First, I think there’s huge value in crawling.  Remember our original problem?  Content was not making it into the search results fast enough.  With the Mini I can schedule crawls, or just set it on continuous mode and let it go nuts.  Using nightly scheduled crawls, I ensure that any content added to the web site shows up in search within 24 hours, and usually faster than that (unless some crazy person is up and adding content to the site at 12:01 AM).  Going through the Webmaster Tools, I can only tell Google to crawl our site at a Normal or Slower rate.  We don’t even rate high enough to get the Faster crawl rate option.  So users of Site Search are pretty well cornered on the matter.  Once I crawl our site with the Mini, I can have the it output a sitemap that I feed to Google’s spider to help with their indexing as well, so the benefit becomes twofold.

Next up, raise your hand if you have an intranet, or otherwise secured information not available to the public.  All of you can pretty well scratch CSBE/SiteSearch off your short list if you’re looking for a way to dig through it.  If you want to index any kind of protected content, you’ll have to go with an actual hardware solution, as both the Mini and GSA support mechanisms to crawl and serve content that is behind a security layer.  This is a great option if you buy a Mini, use up the initial two years of support, then buy a second one: use one for internet and the other for intranet.

You’re also going to find that you are capable of pulling more valuable metrics out of the Mini than what you get with CSBE/SiteSearch.  Granted the standard “what are people searching for” question is easily enough answered.  But what about “what are people searching for that isn’t returning results?”  That can be equally as valuable in a lot of cases.  And while Site Search allows for search numbers by month and day, the Mini can go down to the hour, as well as show you your current queries per minute.  It’ll even keep tabs on how many pages it’s crawling currently, how many errors it found, and email you about it all.  All the reports can be saved out as XML, naturally, so you can mix and match datasets as you need for custom reports.

And I have one word for you: OneBox.  Mini has it, thanks to a trickle down effect from the GSA - hosted Google options do not have it.  The OneBox essentially allows you to add in custom search results based on query syntax, and tailor the styling of the results.  You see this all the time at Google, for instance when you type in a phone number, or FedEx tracking number.  As you can see, these results need not come from your Google Mini search index.  It can come from other collections, or other sources entirely.  In the screenshot to the right, you can see a mock up of a OneBox result that matches a name format and returns contact information along with the standard search results.  Uses for this are many, and can span anything you might store in databases, such as course listings, book ISBNs, names, weather (if you have campuses in different cities), room information, etc.  Anything that you can define some kind of search pattern for.

On a quasi similar note, you can also link certain searches (or parts of searches) to keymatches.  These are commonly used for ads on Google that appear at the very top of search results (usually highlighted light yellow with the “Sponsored Link” caption), but you can use them to highlight a link that goes right to the automotive department when someone searches for something containing the word “auto.”  This is another feature unique to the Mini and GSA, and one more way to make sure searches are presented with relevant links.  This is very useful in cases where a department might not have a well optimized site which doesn’t show up first in a search for their department.

Ultimately, it’s a judgment call whether or not these features are worth the money to you.  At $3000, you’re basically paying $1000 each for the server itself and two years of support.  You can’t buy the unit without support though, so that notwithstanding, you’re getting a full featured search box with support for about twice the cost of a good PC.  If you have more than 50,000 pages to index though, you’ll find that price goes up.  At the same time, if you do have over 50,000 pages, there are a lot of other reasons not to go hosted, such as control over results, index freshness, result relevance, etc.  All these are always important, but they become even more so the bigger your site is.  Consider, if you have half a million pages on your site, and you need to make sure people find the needle that they need to in that haystack, would you rather have some control over that, or cross your fingers and hope Google gets it right?

My end impression is the Google’s Site Search is a great little tool for small businesses that are dealing in a few thousand pages, who can’t afford a server, or who don’t have the resources to maintain it.  Keeping up the server isn’t an involved job at all, but does require someone capable of checking in on it monthly or so, at least.  But, as universities, we generally have the resources for such a tool, both financially and manpower-wise.  We’re also large enough to justify a dedicated box for such an important task.

If you’re still researching what’s right for you in hosted search, it might well be worth keeping an eye on Yahoo BOSS though, it’s making some pretty cool claims on functionality.  OmniFind is also great free software if you have the resources to run it already in place (like a VMWare cluster or other virtualized environment) and can function within its limitations (only having up to five collections being the big one).  Just remember, search is possibly the single biggest tool on your website behind maybe your portal, and it deserves due process to get the treatment and attention your users expect and deserve.

21 Responses to “Is Hosted Search Really Ready for Prime Time?”

  1. Says:

    Does that Google Mini say “Summon Ninja” on the side? :)

    I guess I have one additional question, what about schools that have their whole site hosting off campus? I don’t have a rack to put a Google Mini anywhere. Have you heard about Hosting companies that install and manage these?

    A few weeks ago at eduWEB was the first time I had even heard of the mini, I know embarrassing to admit.

  2. Says:

    Yes, it does, because if you turn off my Google Mini, a ninja will be summoned to come and dispose of you. And yes, that is a sticker on my actual Mini.

    Depending on who hosts the campus’s site, you might be able to ask the company if they would run one for you. There are some people out there that will host them though: https://www.dedicatedserver.com/solutions/hostedSearch.cfm

    But, the cost of some of those (I’ve seen it as high as $350/month), might be easily offset just by buying a Mini and jamming it in a closet with power and an ethernet connection.

  3. Says:

    Plus if you buy the Mini, you get a free T-shirt.

  4. Says:

    But my T-shirt was an extra large, and those that know me, know that I am not an extra large sorta guy.

  5. Says:

    Michael, this is an amazing post. You’re sharing some valuable information. My campus, like a good number of others, are using standard google search. This is an interesting option.

  6. Says:

    Thanks. You know what I’ve told people in the past, considering buying a Mini isn’t like investing $25K in a SAN or something like that. It’s $3000, roughly the cost of a couple good PCs. So the cost of the hardware is pretty worthwhile as far as such things go. Consider what some folks spend an CRM apps, or CMSs, or such things.

  7. Says:

    In regards to testing the social media sites you use, Todd recommended searching for other profiles hosted on said sites. Email Marketing Software

  8. Says:

    This was a great article. I was definitely wondering what was the advantage of using a mini over the GCSBE. We just implemented our GCSBE, pulling all the results via xml into our own page. We married it with our current keyword search database: https://www.edinboro.edu/search/

  9. Says:

    “Just remember, search is possibly the single biggest tool on your website behind maybe your portal, and it deserves due process to get the treatment and attention your users expect and deserve.”

    Really? You have the numbers to support this?

  10. Says:

    I have the numbers from our campus’s analytics, which overwhelmingly supports this. Logins to our portal lead search engine usage by about 12%. The site A-Z index comes in next (and I do consider a site index a “tool”). Since the start of theyear, the search engine and its results pages (cumulatively) rank 3rd and 4th respectively behind the home page and the portal login.

    And, since one could argue that the portal is for established members of campus (faculty, student, or otherwise), the search engine would reasonably be a far more useful tool to visitors and prospective students in the grand scheme, since at least in our case, the portal serves them no purpose. That might vary elsewhere, but still holds true to my original statement that if the importance of search falls anywhere, it’s right behind a portal tool.

    Also consider from a qualitative point of view, how many institutions integrate a search engine box as part of their templates, meaning as a general tool it’s available directly on all main pages of the site. What other site tools do you see used that receive that kind of penetration? People cannot use information they cannot find.

  11. Says:

    Those are interesting numbers.

    How do your local site searches compare to those directly from Google/Yahoo/etc?

  12. Says:

    I can pull some actual stats tomorrow on that. I can tell you from memory that organic results from Google almost exclusively come from general searches for just the school, with an exception on The Zimmerman Telegraph (a faculty page set up on the topic is pretty popular).

    Local searches that consistently hit the top 10 (other top ten queries tend to be seasonal) are things like tuition, costs, housings, FAQs, and calendars (almost always #1).

    The calendars one is particularly interesting, because it’s a pretty important tool as far as information dissemination goes, and there’s a feed and link right on our main page, but apparently people don’t see it.

    We’ve been using all this search data to help us with our redesign. Another good reason why search is is important: it helps you address things that you *think* are obvious, but people have trouble with. The calendar should be a major tool, but we’ve clearly addressed it wrong, and search helped us know that.

  13. Says:

    Michael,

    Great post and a lot of very useful information. I have one question though. With google mini, are my results shared with google? For example, I want my indexed information to be given to google so that when someone searches google, these results will be available in there as well.

  14. Says:

    This was actually something I asked Google in our initial research. They informed me that no information is “sent home” for Google to use, and there’s no data sharing between the two.

  15. Says:

    We used to use Mini and it wasn’t cheap to get us the results that we needed. We have a custom search done for us at a good price and it has more then paid for itself in usability and search history reports.

  16. Says:

    Thanks for your review, Michael. You know you aren’t the one I met who still prefer Mini to other solutions. Comparing my collegues opinion on this case I just thought it’s really powerful. You are right, and thanks once again for your detailed comparison.

  17. Says:

    Forgive me if someone asked this question already. I skimmed through the answers but didn’t see it.

    Can you explain the differences between Google CSE, Google Site Search, the Google Mini, and the Google Search Appliance? Our university — approximately 17,000 students and 40,000 employees — currently employs the GSA but didn’t know if this was the most cost-efficient option? What is your recommendation for a university of that size? Is using the Google Search Appliance overkill?

    • Says:

      Steven,
      Roughly speaking, CSE~Google Site Search and Mini~GSA. The first two are both hosted, the latter two are servers. The hosted solutions don’t give near the flexibility, and with the output of content a site like yours has, constant crawling is pretty valuable.

      For the servers, you probably made a good choice on a GSA. The Mini is a scaled down version of the GSA, and it maxes out at 300,000 documents (depending on your license). But it doesn’t do much beyond just crawl and serve results. The GSA allows for a lot more flexibility in how results are served, and allows you to use a lot more data in results. Granted, that’s assuming you leverage that stuff.

      That’s the brightline on what you need. The GSA is only powerful if you use the options. If you’re just crawling and searching, then you can save a lot of money on a Mini. But, in your case, you say they already have it, so I’d just site back and enjoy it, and catch one of their webinars on how to do some of the higher powered stuff.

      • Says:

        Thanks for the info, Michael. Very helpful!

  18. Says:

    Thanks for the info! Very helpful

Trackbacks/Pingbacks

  1. Here’s a Big Fat Digest » Blog Post » SuperSatellite --> says:

    [...] So, the first big news is that I am now blogging “professionally” for .eduGuru, which is a web technology development blog sorta thing for higher education.  That means things that I used to talk about here will probably get diverted there.  Now, generic web stuff I’ll still post here, as well as things like dotCMS guides and my personal stuff will be here, so don’t panic or anything.  If anything, it means stuff around here will be less dry, and I’ll get to tickle your funny bone more.  And believe it or not, my first post is already up at .eduGuru.  I invite you to go read it, of course.  Here’s a snippet: Is Hosted Search Really Ready for Prime Time? In my years that I’ve now spent in higher education, one universal truth I have found is that nothing quite moves a project along like when someone much more important and much less web savvy than you deems an issue worth addressing.  Such was the case only a couple months after I had started at the university, when the Director of Marketing noticed that new information she had put up on the site wasn’t coming up in search results, and the results that were hitting weren’t particularly relevant to the topic in the first place.  Thus, a mission was born, to find a way to make our search better, and to do it NOW.  That’s the other thing about people higher up than you, when they say jump, generally you jump… [read the rest of this at .eduGuru] [...]