Study: Net Bigger Than We Think
Jul. 26, 2000
SAN FRANCISCO (AP) _ It turns out that cyberspace, like the expanding universe, is more unfathomable than we imagined.
The Internet has become so large so fast that sophisticated search engines are just scratching the surface of the Web's vast information reservoir, according to a new study released Wednesday.
The 41-page research paper, prepared by a South Dakota company that has developed new software to plumb the Internet's depths, estimates that the World Wide Web is 500 times larger than the maps provided by popular search engines like Yahoo!, AltaVista and Google.com.
These hidden information coves, well-known to the Net savvy, have become a tremendous source of frustration for researchers who can't find the information they need with a few simple keystrokes.
``These days it seems like search engines are a little like the weather: Everyone likes to complain about them,'' said Danny Sullivan, editor of SearchEngineWatch.com, which analyzes search engines.
For years, the Internet's uncharted territory has been dubbed the ``invisible Web.''
BrightPlanet, the Sioux Falls, S.D. start-up behind Wednesday's report, describes the terrain as the ``deep Web'' to distinguish from the surface information captured by Internet search engines.
``It's not an invisible Web anymore. That's what so cool about we are doing,'' said Thane Paulsen, BrightPlanet's general manager.
Many researchers suspected that these underutilized outposts of cyberspace represented a substantial chunk of the Internet, but no one seems to have explored the Web's back roads as extensively as BrightPlanet.
Deploying new software developed over the past six months, BrightPlanet estimates there are now about 550 billion documents stored on the Web. Combined, Internet search engines index about 1 billion pages. One of the first Web search engines, Lycos, had an index of 54,000 pages in mid-1994.
While search engines obviously have come a long way since 1994, the reason that they aren't indexing even more pages is because an increasing amount of information is stored in evolving, giant databases set up by government agencies, universities and corporations.
Search engines rely on technology that generally identifies ``static'' pages, rather than the ``dynamic'' information stored in databases.
This means that general-purpose search engines will guide users to the home site that houses a huge database, but finding out what's in them requires additional queries.
BrightPlanet believes it has developed a solution with software called ``LexiBot.'' With a single search request, the technology not only searches the pages indexed by traditional search engines, but delves into the databases on the Internet and fishes out the information contained in them.
The LexiBot isn't for everyone, BrightPlanet executives concede. For one thing, the software costs money _ $89.95 after a free 30-day trial. For another, a LexiBot search isn't fast. Typical searches will take 10 to 25 minutes to complete, but could require up to 90 minutes for the most complex requests.
``If you are frustrated about what you can't find on the Internet, then you are a target audience,'' Paulsen said. ``This isn't for grandma when she is looking for chocolate chip recipes on the Internet.''
The privately held company, which is trying to raise money from venture capitalists, expects LexiBot to be particularly popular in academic and scientific circles. The company also plans to sell its technology and services to businesses looking to provide relevant information for visitors to their sites.
About 95 percent of the information stored in the deep Web is free, according to BrightPlanet. Much of it is technical information that is extraordinarily useful, researchers said. The company has listed 20,000 of the ``content-rich'' databases uncovered by LexiBot on a Web site, completeplanet.com.
Another Web site, invisibleweb.com, already offers a similar directory of large databases on the Internet.
Despite some grumbling, most mainstream Internet users seem satisfied with the free search engines that serve as the Web's road map.
In a survey of 33,000 search engine users earlier this year, NPD New Media Services found that 81 percent of the respondents said they find what they are looking for all or most of the time.
That was an improvement from 77 percent of search engine users reporting a positive experience in the fall of 1999. Only 3 percent of the search engine users said they never find what they want.
Several Internet veterans who reviewed BrightPlanet's research Wednesday were intrigued by the company's software, but warned that it could be too overwhelming.
``The World Wide Web is getting to be so humongous that you need specialized engines. A centralized approach like this isn't going to be successful,'' predicted Carl Malamud, co-founder of Petaluma-based Invisible Worlds.
Like BrightPlanet, Invisible Worlds is trying to extract more data hidden from search engines, but is customizing the information. Malamud calls this process ``giving context to the content.''
Sullivan agreed that BrightPlanet's greatest challenge will be showing businesses and individuals how to effectively deploy the company's breakthrough
``No one else has come up with something like this yet, so when they fetch people all this information on the deep Web, they are going to have to show people where to dive in. Otherwise, people will just drown.''
ON THE NET:
Research paper on the deep web,