拾荒时代------技术,艺术!

.NET/Ajax/C#/Web2.0精品技术文章网摘。

导航

SEO for Ajax and Silverlight Applications

SEO for Ajax and Silverlight Applications

This post contains a writeup of an Ajax pattern I demonstrated at one of my MIX07 talks around improving the indexability of Ajax and Silverlight, and in general RIA applications as part of an SEO effort...

Search engine optimization or SEO is a key thing for web sites and key ingredient is indexability, which works well for static content. However, the model breaks down as applications become more dynamic and start using Ajax or RIA-based technologies such as Silverlight (or Flash) to dynamically fetch and display content using client-side logic. In fact, often the HTML page simply becomes a shell containing presentation and behavior, and doesn't contain the meaningful data that needs to be indexed. The fact that the application is data-driven implies that there wouldn't be much useful data to extract from it, even if it were to be indexed. I'll additionally claim that most post-back based sites also suffer from lack of indexability, because fundamentally the model that works with search engines is navigation.

At MIX07, I presented a pattern in my Ajax patterns talk around improving indexability for Ajax and Silverlight applications in the context of a slide show example. I am putting down in writing, what I described on-stage, with the hope of fleshing it further based on comments. As feared, writing this turned out to be lengthier than expected. However, I hope this will be an interesting read. This is mostly a discussion of the pattern itself but if you're interested in seeing a working sample as well, check out the SlideShow server control in the presentation download.

I still don't have a good name for the pattern. Perhaps I'll call it "HTML Data Channel for RIAs" ... suggestions on naming are also welcome... or is there an existing name for this approach? At a high level, the idea is to basically use static HTML or more accurately said, semantically correct markup, embedded in the page as the data delivery mechanism for an RIA.

The scenario from my demo is as follows: I have a page that has a flickr-like tag cloud and a slide show with fancy transition effects for displaying the photos matching the selected tag. I am using Ajax and Silverlight to implement the next/previous interaction rather than post-backs or navigation to separate pages. Furthermore, tag selections in the tag cloud are handled also in client-side code that uses XMLHttp to fetch the list of photos, and updates the slide show in-place. The problem of course is that the application doesn't score high on indexability. The search engine sees this application as a page with empty placeholder that the client-side code happens to fill with an image on the fly. The goal then is to serve up a page that is search engine friendly without trying to detect a search engine crawler on the server, and without maintaining a parallel static version of the page.

So here is what the SlideShow server control renders out.

...
[photo #1 title] [photo #1 description]
[photo #2 title] [photo #2 description]
...

 

Essentially the SlideShow control does two things - first it renders out the rich view, a Silverlight tag in this case, that is hidden by default, and dynamically turned visible by client script. Secondly, it offers the capabilities of a Repeater control with an ItemTemplate that it uses for generating some alternate static content that it surrounds with a dynamically rendered hidden container.

Here is what happens in a regular browser, with script enabled: The two script blocks, one at the top of alternate content and one at the bottom, execute and the document.write calls result in surrounding the static content within a

element. This ensures that the alternate content doesn't flash in the page as it first gets rendered by the browser, only to be instantly hidden by some initialization script (like you see in some Ajax pages). I used to generally dislike document.write, but sure enough, the API has a nice characteristic that can be leveraged here - it generates markup that is parsed in order as if it were part of the document rendered by the server.

Here is what happens when the page is seen by the search engine, where script is disabled: The two script blocks never get to execute, and as a result the static content isn't wrapped up in a hidden container. Furthermore the static content is designed to be simple and focused on expressing data, not layout or style, and uses a set of tags such as , ,

,

, ..., etc. for what they semantically mean (hence the notion of semantically correct markup) so that it indexes well.

You might be wondering why the tag is invisible by default. This is so that it doesn't consume space on the page if the user is viewing a page with script turned off. If script is enabled, the script gets to toggle its visibility. An interesting fall-out of this mechanism is that a reasonable script-disabled experience of page can be accomodated without too much of pain, if that is a requirement for your application.

In terms of the scenario at hand, the script on the page extracts the images and descriptions to be shown in the slide show from the static HTML content (or the server could render out another copy of the data as a JSON blob). This has another benefit. In lots of Ajax pages, script often issues an XMLHttp request upon page load to fetch the initial data. This consumes an additional request back to the server, and also increases the perceived load time. This additional connection and latency can simply be avoided by serving up the first page's worth of data in the initial page rendering.

 

To complete the indexability discussion, there is one additional problem to solve. If you have a large dataset, you aren't going to embed it all as static content in one shot. Instead, you're going to use some sort of paging or filtering mechanism. In this scenario, the list of tags on the page are the filters. Clicking on one of them causes the script to fetch the specific section of data and update the slide show in-place. However, you certainly want the search engine to index all your data. This is where site maps come into play to complement and complete this pattern for achieving indexability.

It is typical that you will use query string parameters (or better yet, rewritten friendly URLs) to filter down the results and the static data sent down to the client in the initial page request. Effectively your page URLs start to look like SlideShow.aspx?tag=travel or perhaps SlideShow.aspx/travel. Rather than simply having the one SlideShow.aspx URL as the single entry point for your RIA, each URL variation becomes an entry point. While the client-script on your page continues to work unchanged by simply extracting the data it operates on from the static data to start with, and using XMLHttp requests to fetch additional data as the user interacts with the page, your site as a whole now has a set of URLs representing every section of your entire data set, and they can be listed in a site map. Something like so:

  http://.../SlideShow.aspx?tag=travel   http://.../SlideShow.aspx?tag=nature  ...  If you have questions or some feedback, or related ideas please do share using the comment form... I am sure there are other ways to think of this problem... given the importance, and the amount of discussion around this topic.

posted on 2007-07-25 15:26  拾荒时代  阅读(587)  评论(0编辑  收藏  举报