Indexing: Gathering Data
-
Robots
(aka Spiders, Crawlers, Intelligent Agents - as used by
webwide search engines)
-
Can index both local & remote sites
-
Follow links (except Java and JavaScript)
-
Read SSI & generated data, some can read dynamic URLs
(with ?)
-
File System
-
Index all files in specified directories
-
Local/mounted servers only
-
Cannot see dynamic or SSI-style generated text (good and bad)
-
Requires squeaky-clean directories, or you get obsolete
pages indexed
-
Databases
-
Direct Indexing (SQL, ODBC, JDBC)
-
Can be very fast & efficient
-
Can update based on change listings
-
Requires access to database
-
Spider Indexing via the Web
-
Remote access
-
Pages indexed = pages users see
-
May have date and update inefficiencies
-
Coverage & Currency
-
Index everything
There's nothing worse than knowing something is on a web site and
not being able to find it
-
Update indexes as you add data (weekly, daily, hourly!)
-
Remove deleted pages from the index (no 404s!)
Start | Prev
| Next
Thunderlizard Web Usability 2000
Seattle, July 21
Avi Rappoport: Search Tools Consulting
www.searchtools.com