|
Strategic Intelligence Briefs
Search Engines: How do They Work?
These notes complement Alain Paul Martin’s book Harnessing the Power of
Intelligence, Counterintelligence and Surprise Events published in 2002 by
Executive.org.
|
|
Most search engines are a hybrid of at least three expert systems: a gatherer, an
indexer and an extractor.(1)
Electronic Gatherers or Crawlers
The gatherer visits websites, scans for content, links, meta-tags and images and
electronically fingerprints information objects excluding duplicates and objects
from ineligible sites (by policy). With millions of sites to visit, gatherers work
in parallel to speed-up delivery. They are also called spiders, crawlers or knowledge
robots. It takes several days for a search engine to identify new links and web pages.
Electronic Indexers
The indexer takes over to classify content and map it with context to create a
database called a catalog or a knowledge repository. It cross-references content
and other objects by keyword, titles, format, source, dates, and other attributes
specific to the search engine. Some connect related objects using background
knowledge like synonyms, clusters like SIC classification or drill-down hierarchical
lexicons. In addition to these cross-references, smart engines also map content to
subject matter, company, geographic location, people or other contexts and fuzzy sets.
This clustering can create rich content/context relationships. Indexers organize and
store the product in the search-engine databases. They also work in parallel and use
the frequency of past searches collected by extractors to create new pointers, and
build redundancy in high-density search areas to alleviate search-engine traffic jams.
Note that creating an index is a complex task even for humans whose experience and
educational background influence the keywords, phrases and topics to index. The
electronic indexers play it safe on keywords, but have a long way to go on subject
matter cataloguing.
Extractors and Knowledge Brokers
Sometimes called knowledge brokers, extractors interact with users, validate
requests to signal potential errors (typos, dates, Boolean operators), suggest
correct wording, search and match the query with the right indexed content.
They rank the results and report a summary of the findings along with URLs and
links for source tracing and detailed browsing. Smart extractors keep track of
various search patterns, compositions, frequencies, hit rates and other statistics
to optimize the total performance of the search-engine. (2)
Leads for Further Research
For a practical coverage of intelligence tools and platforms, read Chapter 4 of
Harnessing the Power of Intelligence, Counterintelligence and Surprise Events
published by Executive.org.
UC Berkeley describes how search engines work and it offers a list of recommended
search engines together with a useful table of their current features.
(3)
Important
A detailed coverage of intelligence, counterintelligence, strategy, risk, F-Scale
and strategic negotiations is the subject of the management seminar: Strategy,
Risk, Negotiation & Leadership. For seminar objectives, outline and upcoming
sessions in the US and Canada, contact
www.executive.org.
Footnotes
1. Words in plain English like gatherer (or electronic gatherer)
are easier to understand and recall than exotic terms like spider, crawler, scooters,
bots or search robots. Likewise, the term extractor has a higher power of designation
than the academic phrase "knowledge broker."
2. An interesting directory of search engines and indexes can be found
in
www.netstrider.com/search/directory.html
3. Joe Barker: Recommended Search Engines: Table of Features;
Joe Barker: How Do Search Engines Work? Tutorials, UC Berkeley, 2002
www.lib.berkeley.edu/TeachingLib/Guides/Internet/SearchEngines.html
|
|
Cambridge, MA, USA. Call toll free:
1-800-HARVARD
|
|
Ottawa, ON, CANADA. Call toll free:
1-800-HARVARD or (819)772-7777
|
|
Worldwide Order Center & Main Training Campus: 70 Technology Boulevard
Gatineau, QC J8Z 3H8 CANADA, 1-800-HARVARD
International: +1 (819) 772-7777, Fax: +1
(819)772-1114
|
|
Australia Distribution Centre, GPO Box 2253, Melbourne
Victoria, Australia, 3001. Telephone: +61 3-8319-0942
|
|
European Distribution Centre for Harvard Planners: WH Smith, 248, rue de Rivoli, Paris,75001
Dorothée Ben Tahar: +33 1 44 77 88 99 Extension 1 (Stationery). Concorde Metro Station.
|
|
Customer Video Testimonials
|