Info - Search Engines.


The good news is that the World Wide Web consists of a large number of sites containing a diverse amount of information. That's also the bad news. The whole of the WWW has been likened to a flea market. Lots of good stuff scattered around in haphazard fashion.

So the natural question comes up: how do you find the particular piece of information you are looking for? One way would be to use a directory much like the telephone yellow pages. Such directories exist, but by the time they are printed, they are out of date. The preferred way, therefore, is to use the web itself and find information by electronic means. Search engines provide the means.

Indexes

The original web resource listings started out as "favorite sites" pages and built from there as people submitted their pages to the index site for inclusion in the list. As you might imagine, these lists quickly became quite large. You can still find index sites where you start with high-level topics and drill down through listings that are progressively more detailed. But, most now use search routines where you provide key words relating to the information you are looking for and the search engine finds sites it its database that match that information. You are then presented with a list of these sites with links for instant jumps to sites of interest.

Search Engines

How do search engines get the information in their database? It depends on the engine. Some rely on inputs from web creators who submit requests. There are even central sites that collect various information from web page authors and then submit that information to the various search engines they serve.

One of the more common methods of collecting information now, however, is via an electronic search by the engine itself. Viewed from a distance the WWW would look exactly like a web; fully interconnected information nodes. To search this vast information web, search engine sites use "spiders" that move from node to node searching out each one and cataloging what's found there. (Some collection engines are also referred to as "bots," short for robots.) But, as the web grows this method of collecting information is starting to bog down. It how takes up to a month just to make a single pass through the web. New methods are going to have to be developed.

Search engines are a valuable resource, but if you feed them inappropriate key words you can be left as much in the dark as when you started. Getting 50,000 hits for a particular search strategy is hardly effective (although, even with that many responses they are usually ranked so you often don't have to wade through all of them). That's where knowing how to search can come in handy.

Search Strategy

There is no "standard" search engine language, but there are a few common concepts that can help you with most every search site you might use.

     First, understand that most engines consider each word unique. If you want to search on a phrase then you have to tell the engine; usually by enclosing the phrase in quotes. To find Computer Knowledge, therefore, you would enter "Computer Knowledge" instead of the phrase with no quotes (which would find pages with either the word computer or knowledge in them; a considerable number).

     If there is any doubt, use lower case throughout your search. Search engines understand this to mean you will accept any capitalization in response. Also, don't forget that often "*" will mean "anything" and so "search*" would match search, searching, and other endings.

     You can often combine words in a variety of ways using the Boolean notation: AND, OR, and NOT. If you just place a list of words in the search dialog most search engines will assume an OR between each one. This results in the maximum number of hits for the topic(s) you are interested in but often returns much chaff with the wheat. The AND and NOT operators help here. Your searches can get fairly complicated if you wish.

     Frequently a site will also allow you to specify that a certain term be a mandatory part of the search instead of optional. Some, in advanced searches, also allow you to specify a weight for a specific term (i.e., consider one word as being three times as important as another).

  For example: With AltaVista, placing a plus sign "+" before a word means that word is to be considered a mandatory part of the search; and, a minus sign "-" means that word is to be specifically excluded from the search. E.g., "+scotland +golf -fishing" would find all pages that mentioned both Scotland and golf but not fishing.

     Another helpful way to cut down on responses is to use any proximity commands the search engine might have. Some will allow you to specify that your search words must be within "X" number of words of one another in order to count as a hit. See the search engine help to see if the engine you use has this feature.

     If you are not multi-lingual and the search engine you use has the option of displaying a single language or all languages, pick the single language you understand best. That will eliminate many pages you will simply have to skip over if you don't understand them.

     Finally, some search engines have reserved terms for specific searches. Use them if, for example, you are searching for those pages on the web that link to your page(s). Refer to the help pages of the search engine to find the terms for that engine.

Over time and with experimentation you will find the one or two search engines you prefer and find most useful. Keep them in your bookmark file as you will use them often.

The Future of Searching

As indicated at the start, the simple index search and construction of boolean search requests is fast becoming impractical because of the growth of the web. So, what's up and coming then? Here are a few hints based on technology in development, in testing, or coming.

     Collaborative Search. The theory here is that what the majority want, you will want. So, the results you see from your request will be the results that produced the maximum number of click-throughs from others who have searched using the same or similar search terms. In a similar vein, there are now programs that track where users go from particular pages and present those locations as options when you go to that page.

     Natural-Language Search. This is just a front end modification that allow users to submit their queries in the form of regular questions instead of having to form a boolean search query. (Some search engines are starting to implement this type of search.)

     Media Search. Searches now present results based on searches of web page text. But, what if you want to find a particular picture? Now you have to largely depend upon the webmaster putting the description on their page along with the picture. In the future, you'll be able to search on picture characteristics. Other media (e.g., sound) will also be included as time goes on. (Some search engines are starting to implement this type of search.)

     XML Search. The next generation web language, XML, has extensive capability for categories built into it. This allows web page designers to insert keywords into a category structure so that searches based on XML can yield more accurate results. If you are looking for "Word" as a product instead of a concept, XML searching can give this to you.

     Context Search. It's likely that a search hit found in the midst of related terms will be a much more meaningful hit than one isolated amidst non-related text. New search engines are being developed to perform just these kinds of analyses.

Of course, there is still the problem of the search engine actually combing the web and finding things to search. That's still not an easy question to address.

For more indepth information on Search Engines, read this.