As of January, 2012, this site is no longer being updated, due to work and health issues

Guide to Search Tools

Robot Crawling and Indexing Interactive Content

(Forms and Flash)


Crawling Flash

The SWF (Flash) file format has been open for a while, and a lot of search engines have used the format to get at some of the static text in in the Flash files. However, Flash is now an interactive web site application builder, and there is a lot of text that just does not exist until someone comes along and clicks. This has meant that people who wanted their sites properly indexed by webwide search engines could not use Flash, or would have to go to extra lengths to provide static text for search engine robots to find.

What Adobe and Google have just announced is that Adobe is making a special version of the Flash code that can approximate a human interacting with the Flash application in the SWF file, triggering as many application states as it can. As far as I can tell, the Flash client within the search engine indexing robot will be clicking every possible button and entering text in text fields. While indexing the labels on buttons seems odd at first, it makes sense to think of that as as anchor text pointing at other pages (or at least URLs).

The chief concerns I've seen from web site publishers include: the lack of clarity about exactly which JavaScript Flash loading links will be acceptable (especially SWFObject); how external XML files loaded by Flash will be indexed, and how the deep linking into Flash files will work. Adobe has some explanations in their FAQ At the moment, it's SWF only, all versions from the oldest to the current, whether generated by Flash or Flex, which they call "RIAs" (rich Internet applications). However, they are not providing access to FLV files, which are used on YouTube etc. to contain video for playback, and rarely have textual metadata.

Adobe says Yahoo is working on this as well, and Adobe says that they are "exploring ways to make the technology more broadly available" to other search vendors.

No word on whether that includes enterprise and site search developers. There's an excellent writeup from the SEO point of view at Searchengineland, and searchmarketinggurus has a skeptical response.

Crawling Forms

This is similar to what the googlebot is doing on some site forms: automatically clicking every combination of buttons, menus, and checkboxes, and submitting words from the site in text boxes. This has ended up creating phantom shopping carts and search queries. They only do this on GET actions, not on POST, and presumably will not do so if the page has meta NOINDEX and NOFOLLOW tags.

Page Created 2008-07-02

Home Guide Tools Listing News Background Search Contact

Search Tools Consulting's principal analyst, Avi Rappoport, may be available to help you with selection, analysis, user experience, and functional search engine work. Please contact us with your questions, comments, or possible consulting discussions.


Creative Commons LicenseSearchTools.com - Copyright © 2008-2009 Search Tools Consulting.
This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.