Monday, April 07, 2008

Leveraging Microsoft Indexing Service in the Enterprise

There are quite a few tutorials on the internet about Indexing Service, but I found none that showed me all of what I planned to accomplish in my environment. This prompted me to write this HowTo.

Microsoft's description of it is "Indexing Service is a base service for Microsoft® Windows® 2000 or later that extracts content from files and constructs an indexed catalog to facilitate efficient and rapid searching.".

My goal was to leverage this technology to aid my users in searching for documents in our network shares. We have a number of them, and I've found that even if you attempt to promote good "house cleaning" rules, you invariably end up with a mass of documents that can become difficult to circumnavigate. This usually results in frequent phone calls like, "Have you seen this?". Nope, haven't seen it, but if you call Bill, I bet he knows...

While MS Indexing Service, or Desktop Search, works well when tuned properly for the end users local documents, it doesn't do a darn thing for network files. This needs to be set up at the server, and then published in some manner so that the users can effectively search for documents.

Step One Choose the server to house the Index databases, and provide IIS for the ASP pages used for searching. This doesn't have to be the server that shares the files, as you'll soon see. You will, however, want to realize that if you plan to index a large number of documents, it will require some initial horse-power. My environment consists of:

  1. Six network indexes.
    • Each index database range in size from 3mb to 20gb.
    • The number of documents in each network repository that are indexed range in size from 500 to 200,000 documents.
  2. My Indexing server is a 2.8Ghz Xenon, 2Gb RAM, Dell PowerEdge 2600.
Indexing Service out-of-the-box only Indexes:
  1. Text Files
  2. HTML Files
  3. Microsoft Office documents
You can also add iFilter to extend Index Services ability to extract content from files that it normally doesn't support out of the box. Browse to iFilter.org for a list of freely download-able filters. They include PDF, Visio, and many others.

Set up an Index service account that has administrative privileges, but is not the Domain Administrator account. The last thing you need is to create issues when you need to reset the Domain Admin password, and have to track down all your services that use it...that's a drag!

Step Two - Set up your Indexes First, right click on the Indexing Service, and on the Generation tab, check the "Index files with unknown extensions" and "Generate abstracts". Doing this will catch files that might get incorrectly named, and generate an abstract that will be displayed when our ASP search page finds it. Also, unless you intend on Indexing your web pages, remove any references to any other Catalogs by right-clicking and deleting them. You'll have to stop Indexing Services first.

Once your Catalog name and Location of the Indexing Catalog (database) is set, you'll have to choose a location to Index, and an account to use for this process.

Once you've got these set up, give Indexing Services a restart, since the catalogs won't build until you do. Now that this is done, you'll want to set your Indexes thusly...

On the Tracking and Generation tabs, tick the "Inherit above settings from Service", and set the WWW server to None, so as to avoid Indexing your web site. Click Ok, and in the M$ style of service changes, restart the services to be sure your settings apply.

Step Three - Set Up ASP Search Page This is the best part, IMHO. Now you expose these Catalogs for your users searching pleasure. I found bits of this out on the web, and M$ also put together a couple of ASP examples together that can be found in the IISSamples directory of you web server. Some documentation on their use can be found here: http://msdn2.microsoft.com/en-us/library/ms692879(VS.85).aspx.

Take this sample, change line 21, and substitute your servers NetBios name, and modify the SELECT and OPTION tags at lines 139 through 146 to substitute your Indexing locations. The javascript referenced in the script include on line 9 is a utility to check for a valid search term and location.

My simple mod is here:

<%
    ' Customization variables
    DebugFlag = FALSE         ' set TRUE for debugging
    UseSessions = TRUE        ' set FALSE to disable use of session variables
    RecordsPerPage = 20        ' number of results on a page
    MaxResults = -1            ' total number of results returned

    ' Hard-code some parameters that could be taken from the form
    ' SortBy = "rank[d]"       ' sort order
    
    IndexServer = "\\IBSFP\"	'Our indexing server

    ' Set initial  conditions
    NewQuery = FALSE
    UseSavedQuery = FALSE
    SearchString = ""
    QueryForm = Request.ServerVariables("PATH_INFO")

    if Request.ServerVariables("REQUEST_METHOD") = "POST" then
        SearchString = Request.Form("SearchString")
        DocAuthorRestriction = Request.Form("DocAuthorRestriction")
	FileTypeRestriction = Request.Form("FileTypeRestriction")
        FSRest = Request.Form("FSRest")
        FSRestVal = Request.Form("FSRestVal")
        FSRestOther = Request.Form("FSRestOther")
        FMMod = Request.Form("FMMod")
        FMModDate = Request.Form("FMModDate")
        SortBy = Request.Form("SortBy")
        Scope = IndexServer & Request.Form("Scope")
        Catalog = Request.Form("Scope")
        RankBase = Request.Form("RankBase")       
        ' NOTE: this will be true only if the button is actually pushed.
        if Request.Form("Action") = "Search" then
            NewQuery = TRUE
            NextPageNumber = -1
        elseif Request.Form("pg") <> "" then
            NextPageNumber = Request.Form("pg")
            UseSavedQuery = UseSessions
            NewQuery = not UseSessions
        end if
    end if
 %>



<%if DebugFlag then%> <%end if%>
Enter your query :
Select your search location :
Document author :
File type :
Where File Size is :
Modified :
Back       Tips for searching
    SearchString         = <%=SearchString%>
    DocAuthorRestriction = <%=DocAuthorRestriction%>
	FileTypeRestriction = <%=FileTypeRestriction%>
    FSRest               = <%=FSRest%>
    FSRestVal            = <%=FSRestVal%>
    FSRestOther          = <%=FSRestOther%>
    FMMod                = <%=FMMod%>
    FMModDate            = <%=FMModDate%>
    SortBy               = <%=SortBy%>
    Scope                = <%=Scope%>
    NewQuery             = <%=CStr(NewQuery)%>
    UseSavedQuery        = <%=CStr(UseSavedQuery)%>
    
<% if NewQuery then if UseSessions then set Session("Query") = nothing set Session("Recordset") = nothing end if NextRecordNumber = 1 set Q = Server.CreateObject("ixsso.Query") Composer = "" TheQuery = "" if SearchString <> "" then if Left( SearchString, 1 ) <> "@" AND Left( SearchString, 1 ) <> "#" AND Left( SearchString, 1 ) <> "$" then TheQuery = "(@Contents " + SearchString + ")" else TheQuery = "(" + SearchString + ")" end if Composer = " & " end if if FSRestVal <> "any" then if FSRestVal <> "other" then TheQuery = "(@Size " + FSRest + FSRestVal + ") " + Composer + TheQuery else TheQuery = "(@Size " + FSRest + FSRestOther + ") " + Composer + TheQuery end if Composer = " & " end if if FileTypeRestriction <> "any" then TheQuery = "(#filename " + FileTypeRestriction + ") " + Composer + TheQuery Composer = "& " end if if DocAuthorRestriction <> "" then TheQuery = "(@DocAuthor " + DocAuthorRestriction + ") " + Composer + TheQuery Composer = " & " end if if FMMod <> "" AND FMMod <> "any" then if FMMod <> "since" then TheQuery = "(@Write > " + FMMod + ") " + Composer + TheQuery else TheQuery = "(@Write > " + FMModDate + ") " + Composer + TheQuery end if end if if DebugFlag then Response.Write "TheQuery = " & Server.HTMLEncode( TheQuery ) & "
" end if Q.Catalog = Catalog Q.Query = TheQuery Q.SortBy = SortBy Q.Columns = "DocTitle, vpath, path, filename, size, write, characterization, rank" if MaxResults <> -1 then Q.MaxRecords = MaxResults end if set Util = Server.CreateObject("ixsso.Util") if Scope <> "\" then Util.AddScopeToQuery Q, Scope, "DEEP" end if set RS = Q.CreateRecordSet("nonsequential") RS.PageSize = RecordsPerPage ActiveQuery = TRUE elseif UseSavedQuery then if IsObject( Session("Query") ) And IsObject( Session("RecordSet") ) then set Q = Session("Query") set RS = Session("RecordSet") ActiveQuery = TRUE else Response.Write "ERROR - No saved query" end if end if if ActiveQuery then if RS.RecordCount <> -1 and NextPageNumber <> -1 then RS.AbsolutePage = NextPageNumber NextRecordNumber = RS.AbsolutePosition end if if not RS.EOF then LastRecordOnPage = NextRecordNumber + RS.PageSize - 1 CurrentPage = RS.AbsolutePage if RS.RecordCount <> -1 AND RS.RecordCount < LastRecordOnPage then LastRecordOnPage = RS.RecordCount end if Response.Write "Documents " & NextRecordNumber & " to " & LastRecordOnPage if RS.RecordCount <> -1 then Response.Write " of " & RS.RecordCount end if if SearchString <> "" then Response.Write " matching the query " & chr(34) & "" Response.Write Server.HTMLEncode( SearchString ) & "" & chr(34) & ".

" end if %>

<% if Not RS.EOF and NextRecordNumber <= LastRecordOnPage then%> <%end if%> <% Do While Not RS.EOF and NextRecordNumber <= LastRecordOnPage ' This is the detail portion for Title, Abstract, URL, Size, and ' Modification Date. ' If there is a title, display it, otherwise display the filename. ' Graphically indicate rank of document with list of stars (*'s). if NextRecordNumber = 1 then RankBase=RS("rank") end if if RankBase>1000 then RankBase=1000 elseif RankBase<1 then RankBase=1 end if NormRank = RS("rank")/RankBase if NormRank > 0.80 then stars = "_includes/rankbtn5.gif" elseif NormRank > 0.60 then stars = "_includes/rankbtn4.gif" elseif NormRank > 0.40 then stars = "_includes/rankbtn3.gif" elseif NormRank >.20 then stars = "_includes/rankbtn2.gif" else stars = "_includes/rankbtn1.gif" end if %> <% RS.MoveNext NextRecordNumber = NextRecordNumber+1 Loop %> <% ' This is the "previous" button. ' This retrieves the previous page of documents for the query. SaveQuery = FALSE if CurrentPage > 1 and RS.RecordCount <> -1 then %> <%SaveQuery = UseSessions%> <%end if%> <%if Not RS.EOF then ' This is the "next" button. ' This button retrieves the next page of documents for the query. %> <%SaveQuery = UseSessions%> <%end if%>
RankDocument
<%= NextRecordNumber%>: <%if VarType(RS("DocTitle")) = 1 or Trim(RS("DocTitle")) = "" then 'check for presence of doctitle %> File :  " class="RecordTitle" target="_blank"><%=RS.Fields("filename")%> <% else %> Title :  " class="RecordTitle" target="_blank"><%=RS.Fields("DocTitle")%> <% end if %>
<%if VarType(RS("characterization")) = 8 and RS("characterization") <> "" then%> Abstract: <%= Server.HTMLEncode(RS("characterization"))%> <%end if%>

" class="RecordStats" target="_blank"><%=RS("path")%>
<%if RS("size") = "" then%>(size and time unknown)<%else%>size <%=RS("size")%> bytes - <%=RS("write")%> GMT<%end if%>

<% else ' NOT RS.EOF if NextRecordNumber = 1 then Response.Write "No documents matched the query

" else Response.Write "No more documents in the query

" end if end if ' NOT RS.EOF if NOT Q.OutOfDate then ' If the index is current, display the fact %> The index is up to date. <%end if if Q.QueryIncomplete then ' If the query was not executed because it needed to enumerate to ' resolve the query instead of using the index, but AllowEnumeration ' was FALSE, let the user know %> The query is too expensive to complete. <%end if if Q.QueryTimedOut then ' If the query took too long to execute (for example, if too much work ' was required to resolve the query), let the user know %> The query took too long to complete. <%end if%>

<% NextString = "Next " if RS.RecordCount <> -1 then NextSet = (RS.RecordCount - NextRecordNumber) + 1 if NextSet > RS.PageSize then NextSet = RS.PageSize end if NextString = NextString & NextSet & " documents" else NextString = NextString & " page of documents" end if %>
<% ' Display the page number if RS.PageCount <> 0 then Response.Write "Page " & CurrentPage if RS.PageCount <> -1 then Response.Write " of " & RS.PageCount end if end if ' If either of the previous or back buttons were displayed, save ' the query and the recordset in session variables. if SaveQuery then set Session("Query") = Q set Session("RecordSet") = RS else RS.close Set RS = Nothing Set Q = Nothing Set Util = Nothing if UseSessions then set Session("Query") = Nothing set Session("RecordSet") = Nothing end if end if end if %>
You may have to play around with this a bit, but once your users start using it instead of their Desktop Search, they will quickly find this is much more efficient, and thank you.

No comments: