Computerworld
Quick Menu
Search



Ads by TechWords

See your link here


Subscribe to our e-mail newsletters
For more info on a specific newsletter, click the title. Details will be displayed in a new window.
Web Site Management
Application/Web Development
Computerworld Daily News (First Look and Wrap-Up)
Computerworld Blogs Newsletter
The Weekly Top 10
More E-Mail Newsletters 
Computerworld 2007Subscribe to Computerworld
40 years of the most authoritative source of news and information for IT leaders.

QuickStudy: Web Harvesting

 

Sign up to receive Web Site Management Resource Alerts

June 21, 2004 (Computerworld) -- It's hard to argue with the proposition that the World Wide Web is the largest repository of information that has ever existed. In just over a decade, the Web has moved from a university curiosity to a fundamental research, marketing and communications vehicle that impinges upon the everyday life of most people in the developed world. But there's a catch, of course. As the amount of information on the Web grows, that information becomes ever harder to keep track of and use.


More
Computerworld
QuickStudies


This vast amount of freely available information is spread over billions of Web pages, each with its own independent structure and format. So how do you find the information you're looking for in a useful format—and do it quickly and easily without breaking the bank?

Search Isn't Enough

Search engines are a big help, but they can do only part of the work, and they are hard-pressed to keep up with daily changes. For all the power of Google and its kin, all that search engines can do is locate information and point to it. They go only two or three levels deep into a Web site to find information and then return URLs. They also find and return meta descriptions and meta keywords embedded in Web pages, but these may well be inaccurate.

Consider that even when you use a search engine to locate data, you still have to do the following tasks to capture the information you need:

  • Scan the content until you find the information.
  • Mark the information (usually by highlighting with a mouse).
  • Switch to another application (such as a spreadsheet, database or word processor).
  • Paste the information into that application.

A better solution, especially for companies that are aiming to exploit a broad swath of data about markets or competitors, lies with Web harvesting tools.

Web harvesting software automatically extracts information from the Web and picks up where search engines leave off, doing the work the search engine can't. Extraction tools automate the reading, copying and pasting necessary to collect information for analysis, and they have proved useful for pulling together information on competitors, prices and financial data of all types.

Continued...
1 | 2 | NEXT  



Print this Story Send Us Feedback E-mail this Story Digg! Digg this Story Slashdot this Story
Web Harvesting
Sidebar: Resources for more information on Web harvesting
Sidebar: Web harvesting and libraries
Sidebar: Also Known As ...
"Over the festive break there were quite a few announcements related to..." Read more...
"Yahoo's owners have spoken and Jerry Yang is out as Yahoo CEO. Does this mean that Microsoft is in?..." Read more...
Read more Business Intelligence posts or See all Blogs
Microsoft to push IE8 via Automatic Update, issues blocking tool
LG debuts wrist-watch phone at CES
CES: Consumer electronics to stay ahead of other sectors in recession
More top stories...
CheckFree warns 5 million customers after hack
Asus debuts S121 netbook with Windows 7 and 512GB SSD
Report: Microsoft to do free Windows 7 upgrades
The downturn has softened the IT talent market but done little to weaken demand for SAP, .Net and other technical skills.
Every computer user hits a speed bump now and then. Here are some speedy, simple solutions to hardware, software, network, Internet and mobile-device crises.
From the iPhone 3G to 'unibody' MacBooks, 2008 was a standout year for Apple.
We've got reviews and videos of the new Ubuntu 8.10, Fedora 10 and openSUSE 11.1.
Get the latest news, reviews and more about Microsoft's newest desktop operating system
Find wage data for 50 IT job titles.
All Zones
Business Continuity Zone
The File Data Management Zone
Security Management Zone
The SAS Zone
Business Intelligence and Analytics Zone
The Enterprise Search Zone
Software as a Service Zone
The Security Zone

Ads by TechWords

See your link here
Sold on SOA

(Source: Computerworld) It's the hot technology for most large companies, but business, technical and cultural issues must be addressed for a successful SOA implementation. Get the whole story, from the big picture to the how-to-do-it details, in this Executive Bulletin. Download this Executive Bulletin (a $49.95 value) for Free, compliments of Fujitsu.
Download this executive briefing download
Advances in SSL and Certificate Management
Advances in SSL and Certificate Management
View this webcast now!
Go to the webcast 
Best Practices for Delivering Virtual Classroom Training
Download this white paper, compliments of Adobe, for a limited time!
(Source: Adobe) How can you encourage people to listen, and even better, absorb your presentation? How will you know your audience is engaged? This paper provides suggestions and pitfalls that can increase your effectiveness when you're training a remote audience.
Download this white paper go
White Papers
Read up on the latest ideas and technologies from companies that sell hardware, software and services.
NetApp and VMware Virtual Infrastructure 3 Storage Best Practices
Go Green with IBM System x Servers and Intel Xeon Processors
Total Cost Comparison: IT Decision-Maker Perspective on EMC, HP and Network Appliance Storage Solutions in Enterprise VMware Environments
View more whitepapers