Computers harness language translation

Afghan Army doctors in a clinic adjacent to Forward Operating Base Lightning and outside the city of Gardez in Paktia Province, Afghanistan, received copies of the critical care manual translated during a partnership between doctors who were in the region and U.S. Army Research Laboratory researchers using computer translation technology. (U.S. Army photo)

Afghan Army doctors in a clinic adjacent to Forward Operating Base Lightning and outside the city of Gardez in Paktia Province, Afghanistan, received copies of the critical care manual translated during a partnership between doctors who were in the region and U.S. Army Research Laboratory researchers using computer translation technology. (U.S. Army photo)

By Joyce P. Brayboy, ARL Public Affairs

While leading a medical training team in Kabul, Afghanistan, a U.S. Navy commander became frustrated as he faced the challenge of interpreting complex medical information.

Commander Kurt Henry was seeing cases of intestinal tuberculosis that he knew were treatable, but the regional hospital’s critical care unit did not have medical manuals to provide treatment instruction for newly assigned doctors.

When he scanned the Internet for documentation about treatment options, he only came across information written in English. His team spoke the native language of the Afghan people, Dari, recalled Steve LaRocca, computer scientist and team chief at the U.S. Army Research Laboratory.

Now, almost seven years later, the situation is better for medical trainers because of statistical machine translation methods that cut down on the Army’s reliance on human translators in projects that require massive amounts of translation.

By early 2012, the ARL had provided 500 printed English-Dari special trainers’ editions of the critical care reference manual to doctors in hospitals and clinics throughout Afghanistan to meet the need for medical teams like Henry’s.

More and different manuals have since been translated, printed and shipped, and another priority translation is currently nearing completion.

ARL computer scientists and the newly assigned Afghan doctors have carefully translated and collected more than 6,000 Dari medical phrases over the course of the initial project.

Secondary products, including an Android “Army Phrase Book” app, have been developed to make broader use of the expertise captured in the translated phrases.

Without computational support, translators would speak into a recorder for an hour to extract small bits of data, LaRocca said.

“The challenge was working with a limited pool of potential translators who were familiar with Dari, a less commonly taught language; and who also understood medical jargon,” LaRocca said.

Speech recognition technology was LaRocca’s specialty when he retired from West Point as a language professor and founding director for the Center for Technology Enhanced Language Learning in 2004.

LaRocca advised military leaders on getting the most from limited translation resources, when he wore the uniform, with the understanding that “there is no way our language-qualified people could give all the capacity we need in theater.”

At ARL, his team explores ways to harness the knowledge of linguists by capturing hundreds of hours of translations stored in databases where the translated sentences could be shared and reused.

The laboratory applies statistical machine translation methods to specialized Army problems where there is not a commercially available solution, said Melissa Holland, chief for ARL’s multi-lingual computing research program.

“Computers could never replace the human translator, but we look for ways to relieve some of the burden, especially in less commonly used languages, like Dari, Pashto and Serbian,” Holland said.

The multilingual computing group addresses challenges with medical, and also legal and Army training translations, she said. The information used in translating the medical phrases is kept in a database for use across the Defense community.

Computer translation breakthroughs in the last decade, along with the Dari datasets, greatly reduced the projects’ dependence on the small number of bilingual human translators, and who are also subject matter experts. Computers remember and reuse expert knowledge.

“We’ve had people translating every day in Korea since about 1951, but we didn’t save the datasets over those decades,” LaRocca said. “The knowledge generated by all those people over all those years is gone.”

He said, “If we had the presence of mind to curate that data or prepare it for the eventual use of technology, we would be so much better off in that language and many others.”

LaRocca embraced the idea of capturing and saving datasets from projects in the Dari and Pashto languages.

He is not the only one. Lt. Col. Forest Kim led a team of medical advisors under the surgeon general in Afghanistan from November 2013 to May 2014. His team had seven language translators, but he said there is not enough time or assets to translate large volumes of text.

His team circulated discs and DVDs to train medical trainers in the region.

“We were making a lot of changes, but I knew we were going to leave,” Kim said. “We had to get to the point of serving the force in a supporting role.”

Kim made it a priority to capture and upload all of the medical advisory documents to one central database. But he did not have a way to translate this information to other languages at the time.

ARL computer translation experts hope to expand the military’s ability to translate volumes of critical data, LaRocca said.

The Army Program Office associated with translation technology anticipates an Army need for three new languages a year and expanding domains to include legal, criminal justice, military training and medical, he said.

We have developed a way to curate data as fast as we translate it. We also have developed more than one way of capturing and reusing language data, he said.

“Although the manual may be worn in 10 years, the datasets captured from the translations will live on and be valuable for decades to come,” LaRocca said.

When Kim was in Afghanistan, the physicians gave him a manual as an example of what they use for emergency war surgery that had been translated from Russian at least 40 years earlier.

“When U.S. forces are gone from the region, the U.S. documents will remain. As I see it, what ARL has done translates to tremendous training value to the physicians, as well as goodwill to the nation,” he said.

—-

This article appears in the May/June 2015 issue of Army Technology Magazine, which focuses on Future Computing. The magazine is available as an electronic download, or print publication. The magazine is an authorized, unofficial publication published under Army Regulation 360-1, for all members of the Department of Defense and the general public.

The Army Research Laboratory is part of the U.S. Army Research, Development and Engineering Command, which has the mission to develop technology and engineering solutions for America’s Soldiers.

RDECOM is a major subordinate command of the U.S. Army Materiel Command. AMC is the Army’s premier provider of materiel readiness–technology, acquisition support, materiel development, logistics power projection and sustainment–to the total force, across the spectrum of joint military operations. If a Soldier shoots it, drives it, flies it, wears it, eats it or communicates with it, AMC provides it.