Tuesday, April 15, 2008

Google Custom Search Engine (CSE) for ASP.Net C#

What is the Google Custom Search Engine (CSE)? Well it is a service Google offers that lets you search a predefined list of websites from your own website. So if you ever wanted a professional search engine on your website to let users search your content, than here it is.

Quote from Google
Have a website or collection of sites you'd like to search over? With Custom Search Engine, you can harness the power of Google to create a search engine tailored to your needs.

Create a search engine tailored to your needs

  • Include one website, multiple websites, or specific webpages
  • Host the search box and results on your own website
  • Customize the colors and branding to match your existing webpages
For more information go to: http://www.google.com/coop/cse/.

When you go to that website (you need a Gmail login, I believe the sign-up is open else send me an e-mail for an invite) you should ‘create a Custom Search Engine’ fill-out the forms and when you finalized the setup of your CSE you should go to the ‘Code’ page and copy the values of ‘cx’ and ‘cof’ from the javascript listed in the Search box code textbox.

Now when you have a website not build in ASP.Net you would just simply copy and paste this javascript completely in your search page, but because you cannot have nested <form> tags on an ASP.Net page you get into trouble. Now for that I created a solution that I like to share with you.

This ASP.Net user control is all you need to place on your ASP.Net page just copy the 'cx' and 'cof' values in the control declaration, and you have your own Google Custom Search Engine on your website. For an example on how this looks like please go to my search page.

Below here is how you should add the user control to your ASP.Net page.
Using the Google Search control
1
2
<%@ Register TagPrefix="uc" TagName="GoogleSearch" Src="~\Controls\GoogleSearch.ascx" %>
<uc:GoogleSearch ID="GoogleSearch" Runat="Server" Cx="001028438527593528881:1xv9ufijyhm" Cof="FORID:11" Title="Search the Nijhof.com website" />
Here is the actual code of the user control, as you can see it is a very simple control.
File name: GoogleSearch.ascx
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
<%@ control language="C#" autoeventwireup="true" inherits="Controls_GoogleSearch, App_Web_googlesearch.ascx.71f77ed5" %>
<link rel="Stylesheet" type="text/css" href="/Css/GoogleSearch.css" />
<asp:panel id="SearchBoxPanel" runat="server" cssclass="cse-box">
    <asp:panel id="InputPanel" runat="server" cssclass="cse-branding-form">
        <asp:textbox id="q" runat="server">
        <asp:button id="GoogleSearch" runat="server" text="Search" causesvalidation="false">
    </asp:button>
    <asp:panel id="ImagePanel" runat="server" cssclass="cse-branding-logo">
        <asp:image id="GoogleImage" runat="server">
    </asp:image>
    <asp:panel id="TextPanel" runat="server" cssclass="cse-branding-text">
        <asp:literal id="GoogleText" runat="server">
    </asp:literal>
</asp:panel>
<asp:literal id="GoogleScript" runat="server">
<asp:panel id="SearchResults" runat="server" cssclass="cse-results">
</asp:panel></asp:literal></asp:panel></asp:textbox></asp:panel></asp:panel>
This is the code behind page of the user control, here we will forward the search query to Google and process the results so that it will fit nicely inside the page and that links to more search results and other Google options will actually point to your page instead to the Google website.

We will first look at the ‘Page_Load(object sender, EventArgs e)’ method in there you see that we check if a PostBack event happened. If this is the case and the search textbox is not empty than we assume that the PostBack is for the Google Search and we forward the request to Google using the ‘ProcessUrl(string url)’ method. You may want to change this behavior to be triggered on the click event of the search button. However when we arrive on this page without the PostBack event than we check if the page contains the ’q’ QueryString variable because when it does we assume it is because the user clicked on the modified links in the search results indicating he wants a different page or another Google option. So we forward this request to Google again using the ‘ProcessUrl(string url)’ method.

The method 'ProcessUrl(string url)' is actually requesting the information from Google and after it receives the answer from Google it will remove some unneeded html tags because we load the information inside an existing page, and we alter some links from the search results. These are the links pointing to more search results and other Google options, re now actually point them to our own page giving the user an even better user experience. In the 'GoogleSearch.xml' file there are key value pairs that are used with a regular expression to search and replace html tags. So if Google decides to change the layout of their search page you may alter these values.

The method 'GenerateScript()' is a copy of the javascript content that Google provides to get the watermark in the search box, this needed to be altered a little bit because ASP.Net changes the id’s of the controls depending on where they are loaded. Thus not in a predictable way.
File name: GoogleSearch.ascx.cs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
using System;
using System.Configuration;
using System.IO;
using System.Net;
using System.Text;
using System.Linq;
using System.Text.RegularExpressions;
using System.Web.UI;
using System.Xml.Linq;

/// <summary>
/// Google Search user control, using the Google Custom Search Engine (CSE)
/// http://www.google.com/coop/cse/?hl=en
/// </summary>
public partial class Controls_GoogleSearch : UserControl
{
    /// <summary>
    /// Variable containing the cof value provided by Google when creating a CSE
    /// </summary>
    private string _Cof;
    /// <summary>
    /// Variable containing the cx value provided by Google when creating a CSE
    /// </summary>
    private string _Cx;

    /// <summary>
    /// Variable containing the text that is beeing displayed behind the Google image
    /// </summary>
    private string _Title;

    /// <summary>
    /// Gets or sets the cx value provided by Google when creating a CSE.
    /// </summary>
    /// <value>The cx.</value>
    public string Cx
    {
        get { return _Cx; }
        set { _Cx = value; }
    }
    /// <summary>
    /// Gets or sets the cof value provided by Google when creating a CSE.
    /// </summary>
    /// <value>The cof.</value>
    public string Cof
    {
        get { return _Cof; }
        set { _Cof = value; }
    }

    /// <summary>
    /// Gets or sets the text that is beeing displayed behind the Google image
    /// </summary>
    /// <value>The title.</value>
    public string Title
    {
        get { return _Title; }
        set { _Title = value; }
    }

    /// <summary>
    /// Initializes a new instance of the <see cref="Controls_GoogleSearch"/> class.
    /// </summary>
    public Controls_GoogleSearch()
    {
        _Title = "";
        _Cof = "";
        _Cx = "";
    }

    /// <summary>
    /// Handles the Load event of the Page control.
    /// </summary>
    /// <param name="sender">The source of the event.</param>
    /// <param name="e">The <see cref="System.EventArgs"/> instance containing the event data.</param>
    protected void Page_Load(object sender, EventArgs e)
    {
        GoogleScript.Text = GenerateScript();
        GoogleImage.ImageUrl = Page.ResolveUrl("~/Images/Misc/Google.gif");
        GoogleText.Text = _Title;

        if (IsPostBack)
        {
            if (q.Text != "")
            {
                string Url = "http://www.google.com/cse?cx=" + Cx + "&q=" + q.Text + "&cof=" + Cof + "&ie=utf-8&oe=utf-8";
                ProcessUrl(Url);
            }
        }
        else
        {
            if (Request.QueryString["q"] != null)
            {
                string Url = "http://www.google.com/custom?" + Request.QueryString;
                q.Text = Request.QueryString["q"];
                ProcessUrl(Url);
            }
        }
    }

    /// <summary>
    /// Processes the URL.
    /// </summary>
    /// <param name="url">The URL.</param>
    private void ProcessUrl(string url)
    {
        StringBuilder sb = new StringBuilder();

        byte[] buf = new byte[8192];
        HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
        HttpWebResponse response = (HttpWebResponse)request.GetResponse();
        Stream resStream = response.GetResponseStream();

        int count;
        do
        {
            count = resStream.Read(buf, 0, buf.Length);
            if (count != 0)
            {
                string tempString = Encoding.ASCII.GetString(buf, 0, count);
                sb.Append(tempString);
            }
        }
        while (count > 0);

        string document = sb.ToString();
        document = Regex.Replace(document, "\\n", "", RegexOptions.IgnoreCase);
        document = Regex.Replace(document, "=\"/custom", "=\"" + Request.FilePath, RegexOptions.IgnoreCase);

        string GoogleSearchXmlPath = Server.MapPath(ConfigurationManager.AppSettings["GoogleSearchXml"]);
        if (!File.Exists(GoogleSearchXmlPath))
        {
            throw new Exception("The location to the GoogleSearch.xml file is not defined in the Web.Config file attribute GoogleSearchXml");
        }
        XDocument xDocument = XDocument.Load(GoogleSearchXmlPath);
        var items = from item in xDocument.Descendants("Filter")
                    select new
                    {
                        OrgText = ((string)item.Element("OriginalText")),
                        NewText = ((string)item.Element("ReplaceByText"))
                    };

        foreach (var item in items)
        {
            document = Regex.Replace(document, item.OrgText, item.NewText, RegexOptions.IgnoreCase);
        }
        SearchResults.Controls.Add(new LiteralControl(document));
    }

    /// <summary>
    /// Provides the script that had been copied and altered from the original Google CSE script. 
    /// Needed to customize the script because of the ASP.Net clients id's
    /// </summary>
    /// <returns>The script as a string</returns>
    private string GenerateScript()
    {
        return ("<script language='javascript' type='text/javascript'> (function() { var q = "+
                "document.getElementById('" + q.ClientID + "'); if (q) {var n = navigator; var "+
                "l = location; if (n.platform == 'Win32') { q.style.cssText = 'border: 1px solid "+
                "#7e9db9; padding: 2px;'; } var b = function() { if (q.value == '') { q.style.background "+
                "= '#FFFFFF url(/Images/Misc/google_custom_search_watermark.gif) left no-repeat'; } }; var "+
                "f = function() { q.style.background = '#ffffff'; }; q.onfocus = f; q.onblur = b; "+
                "if (!/[&?]q=[^&]/.test(l.search)){ b();} } })(); </script>");
    }
}
Below is the xml file with the different regular expressions used to remove and alter the html provided by Google. If you want to strip certain tags from the output this is the pace to do it. I have intentionally left the <style> tags from Google in there because I like the look and feel of that. Also on the Google Custom Search Engine pages you also have the option to alter the color schema used for the result page.
File name: GoogleSearch.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
<?xml version="1.0" encoding="utf-8" ?>
<GoogleSearch>
  <ResultFilters>
    <Filter>
      <OriginalText><![CDATA[<html><head>(.*)</title>]]></OriginalText>
      <ReplaceByText><![CDATA[]]></ReplaceByText>
    </Filter>
    <Filter>
      <OriginalText><![CDATA[</head>(.*) marginheight=2>]]></OriginalText>
      <ReplaceByText><![CDATA[]]></ReplaceByText>
    </Filter>
    <Filter>
      <OriginalText><![CDATA[</body></html>]]></OriginalText>
      <ReplaceByText><![CDATA[]]></ReplaceByText>
    </Filter>
    <Filter>
      <OriginalText><![CDATA[ width=100% ]]></OriginalText>
      <ReplaceByText><![CDATA[ align=right ]]></ReplaceByText>
    </Filter>
    <Filter>
      <OriginalText><![CDATA[</table><table]]></OriginalText>
      <ReplaceByText><![CDATA[</table><br /><table]]></ReplaceByText>
    </Filter>
    <Filter>
      <OriginalText><![CDATA[<script(.*)script>]]></OriginalText>
      <ReplaceByText><![CDATA[]]></ReplaceByText>
    </Filter>
  </ResultFilters>
</GoogleSearch>
You should point to the location of the XML file in your Web.Config file.
File name: Web.Config
1
2
3
<appSettings>
  <add key="GoogleSearchXml" value="/Controls/GoogleSearch.xml" />
</appSettings>
The style sheet that gives the look and feel of the search box, button, Google logo and text. Nothing more nothing less.
File name: GoogleSearch.css
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
.cse-branding-form 
{
 height: 20px;
 width: 220px;
 float: left;
}
.cse-branding-logo
{
 height: 20px;
 width: 60px;
 padding-top: 2px;
 display: block;
 float: left;
}
.cse-branding-text 
{
 height: 20px;
 width: 270px;
 padding-top: 6px;
 display: block;
 font: 10px Arial;
 float: left;
}
.cse-box
{
 height: 20px;
 width: 555px;
}
.cse-results
{
 width: 95%;
}
You may download a copy of the source code for your own use: Google_CSE_ASP.Net.zip, it would be nice if you send me an e-mail if you find the code usefull (or not). Please note that you need version 3.5 of the .Net framework to run this code.

6 comments:

Migofast said...

Just what I needed!

Now how do I get the results to show up in a results page instead of the page/panel the control is on?

Mark Nijhof said...

You may wish to hide the SearchBoxPanel in the GoogleSearch.ascx file this will hide the box above the results. Than place the control in the result page.

Than place a TextBox control somewhere on your site named q and post the values to the result page, this can be done either using POST of GET

Stop the method GenerateScript() from beeing called, this generates the overlay image in the TextBox, so you may wish to move this function into the page where the search box is.

Let me know if you manage with the short description.

-Mark

jld said...
This comment has been removed by the author.
jld said...

This is the result code :
"li div class="g"
... /div
li
...
"
"/li" is missing on each items.
Is it a GG bug or is it possible to fix it ?

Thank you

jld said...

About showing results in a separate page. I'm not sure to understand your short description. Any little sample code ?
Thanks a lot

Mark Nijhof said...

Let me come back to you later tonight. I'll see what I can do.