My Notes from Google Search Appliance Seminar
On Thursday, November 12, 2009 I attended a seminar at Google's Chicago office given by Bob Segal of Fig Leaf Software on the Google Search Appliance (GSA). The following are my notes from that two hour presentation:
- The GSA returns search results according to the user's security. This means that the GSA can index secured content and only include it in search results if the user performing the search has access rights to the secured content.
- Websites used as examples in the presentation:
http://www.heritage.org/
http://www.muschealth.com/ - Mozzilla Firefox has a plugin called URL Params that Bob used frequently in the demonstration. It presents all variables in the URL scope in an easy to view form rather than editting the query string in the address bar. https://addons.mozilla.org/en-US/firefox/addon/1290
After installing I found that it also supports editting form fields as well, but there is an extra step required to install the plug-in to make it function as a Sidebar: chrome://urlparams/content/lib/addpanel.xul - Search results can be categorized using both collections and meta data. The example shown was a column on the right hand side of the page showing number of results found in each category of the site allowing the user to narrow their search focus.
- The term Universal Login was used and my understanding of the definition is that it is "using the same credentials across multiple authentication systems." This means the user still needs to login to each system, but they are using the same username and password rather than a different shema for each. Not quite Single SignOn, but a good workaround for disparate systems.
- Great resource for learning how to use and support a GSA: http://www.learngsa.com/
- Status and Reports > Crawl Diagnostics has a URL Status drop-down which can be used to limit the view to 404 Errors, Robot Exclusions, etc...
- You can view search results in XML format by removing the proxystylesheet parameter. I tested this and you need to remove the variable completely, just setting it to a blank value yields a 500 server error.
- You can add meta data to be displayed in the XML by setting &getfields=*
- When searching you can limit the results to only include searching specific meta data fields:
inmeta:author=[specific name]
inmeta:author~[partial name]
inmeta:title
inmeta:site
- Three URL parameters select what you are searching, how it looks, and how the results are modified:
- client: This is your Front End excluding the Output Format section and controls your how your results are modified such as keywords, filtering, exclusions, etc.
- proxystylesheet: this is the Output Format section of your Front End and controls the page's design
- site: this is your collection name
- Next, Dictionaries were discussed. This is where you can set up synonyms to guide the user to industry specific terms. The example used was the term "port". It means different things to the shipping industry, wine makers, and computer hardware builders. I believe this is the Query Expansion item under the Serving section for GSA 5.2.0.
- In Serving > Front Ends > Output Format > Page Layout Helper > Global Attributes there's a setting Enable ASR / Enable Advanced Search Reporting. I never researched what this meant, but it sounded like a good thing, so I clicked it. The presenter talked about Self Learning Scoring which is where the GSA learns which links are better results based on which links people click on in search results. This is ASR, Advanced Search Reporting.
- GSA Unification is for connecting mutliple GSAs together and is available in version 6.x for the newer GSA machines, but not the 1001.
- Search Dates are controlled by document dates. The presenter recommended that we select meta, provide a custom meta field and modify our pages to include that meta field and control what date we want the content to use. The demo looked like there was a dropdown in 6.x, but in 5.2.0, there are two sections on the Serving > Result Biasing config page.
- The GSA is case sensitive for its crawls (I'm assuming because it is running on a *nix platform), so the same document could be returned several times in the results depending on how it is called. For example, MyDocument.doc and mydocument.doc are two different results. Now that I'm thinking about it more though I'm not sure if this means the document has to exist with two different cased names or if it is simply linked with two differently cased names. I was under the impression that it would be indexed multiple times if there were multiple links with different variations of case.
- Another useful link was given: http://gigz.com/google.htm
Bonus: the homepage contains useful links for a variety of other technologies that I'm interested in! - And finally, I was excited to hear ColdFusion mentioned in the presentation! w00t!
Links from notes:
http://www.heritage.org/ - example of basic search using GSA
http://www.muschealth.com/ - example of a search integrated into the website using GSA
https://addons.mozilla.org/en-US/firefox/addon/1290 - URL Params Firefox Plug-in
chrome://urlparams/content/lib/addpanel.xul - Enables URL Params in Firefox as a Sidebar
http://www.learngsa.com/ - Learning site for the Google Search Appliance
http://gigz.com/google.htm - Collection of useful links for the Google Search Appliance
This training is still available in Dallas at the time of writing this entry. You can register at http://www.figleaf.com/Training/GoogleSeminar.cfm

