Scientists Try To Teach Computers To Read
AMHERST, N.Y. (AP) _ While computer scientists consider ways to make Americans more computer literate, a university professor is trying to make America’s computers literate.
Teaching computers to read involves teaching them to ″think″ like humans - by learning from experience and making educated guesses. But what most people learn easily at age 6 is proving far more difficult for computers.
Humans ″don’t just do character recognition. We don’t just look at one thing,″ said Sargur Srihari, a professor at the State University of New York’s Buffalo campus. ″It’s going to be many, many years before computers can read documents such as handwritten letters.″
For now, Srihari and his team of about 30 assistants would be satisfied if the computer could read the addresses on the envelopes those letters came in. So would the Postal Service, which is supporting their project with a $2.1 million grant.
The central post office in Buffalo, for example, must sort up to 3.3 million pieces of mail a night. Its existing equipment could process up to 42,000 letters an hour - if the addresses were perfectly printed. But those machines reject more than 30 percent of the mail as unreadable, said Dennis Wnuk, a postal operations officer in Buffalo.
″Primarily, what our optical character readers can read is business-type mail, preprinted mail,″ Wnuk said. ″The average piece of mail that a residential customer will put into a collection box, we’ll only read about 25 percent of those - and that’s only if he prints very well or happens to have a typewriter.″
Business mail is also a problem if it comes in a gaudily printed envelope, if the address is too high, or if it is folded so that part of the letter shows through the window along with the address, said John Gullo, an automation and readability specialist at the post office.
″People who use printers, they don’t like to change the ribbon because they want to get every last address out of the ribbon. That gives us trouble,″ he said.
The first major problem facing Srihari’s team is to find the address itself - simple for humans, but bewildering for computers that must separate the address from the ″noise″ on such things as magazine covers and junk-mail envelopes.
The SUNY computer zeroes in on text blocks with the proper shape - those with lines flush to the left, for example - and has already achieved a 90 percent success rate.
Next, the computer must figure out as much of the written address as possible.
To recognize a ‘2,’ for example, a computer could be taught to look for ends in the top center and lower right, a curve in the upper right and a sharp ‘V’ bend in the lower left, said Alan Commike, a graduate student who is handling the number-recognition aspect of the project.
The problem is that there are so many different ways to write numerals that 130 of those rules are needed to sort them all out.
″There are twos with holes,″ or loops instead of points, Commike said. ″There are fives with hats, fives without hats. There are British sevens and American sevens.″
The next step is to teach the computer to make educated guesses to fill in unreadable blanks. For example, if the street address is not entirely legible, the computer narrows it to a few possibilities by matching the part it can read against a list of all streets and numbers in that particular ZIP code.
″This is what is called using contextual information . . . bringing knowledge to bear, which is what people do,″ Srihari said.
Srihari’s computers can now read about 75 percent of handwritten ZIP codes; current postal equipment can read only about 5 percent of them. But the new process is painfully slow - up to a minute for each piece of mail.
Speeding it up will require specially designed hardware, and is 18 months or more away, he said.
The researchers are testing their equipment with the real thing.
″We had a team of undergraduates working nights in the Buffalo post office,″ Srihari said. ″We didn’t delay anybody’s mail. We just took it for a few minutes, captured it and put it in our database.″
Eventually, Srihari said, the technology will have a variety of applications. For example, a busy newspaper reader could feed his paper into the computer, which would cull out only those stories of interest to him. Office workers could do the same with memos.
Similar ″videotex″ systems were never very successful. And the prediction that computers would eventually create a paperless office was wrong: if anything, computers have made offices even more papery.
Srihari said his project could someday give people the best of both worlds: they could have their paper and computerize it, too.
″People are not going to get away from hard copy. People like to have hard copy,″ he said. ″It’s something tangible. It’s nice to be able to take it somewhere.″
End Adv for Monday, Oct. 1