Chinese linguists are going to complete China's largest database of spoken Chinese, on the basis of which they will compile the country's first modern spoken Chinese dictionary and grammar book.
Shen Jiaxuan, director of the Chinese Academy of Social Sciences (CASS) Institute of Linguistics, said the database include three sub bases such as a live Chinese conversation base whose data were collected in Beijing, a base consisting of six dialects of Shanghai, Xi'an, Guangzhou, Beijing, Chongqing and Xiamen, and a base of phonetic symbols of modern spoken Chinese.
The live conversation base now has 650 hours of live conversations recorded in Beijing, which were transferred to 8.9 million words in transcript.
Shen led the key research project of the CASS in the past four years. Linguists always establish databases of written languages in the first place. They, however, realize that databases of spoken languages are in the same importance to linguistic research. Information technology and multi-media technologies help make the research feasible.
"Developing countries have already made a lot of research in building databases of spoken languages, including Chinese," Shen said.
Some multi-national companies in information technology are eager to recruit local talents in developing spoken Chinese databases.
"If we do not speed up to build our own spoken language database," Shen said, "we might lose advantage in studying spoken Chinese and developing applied technologies."
Shen and his research team are now developing management and search software for the spoken Chinese database as well as a spoken Chinese database of children, which is expected for research on language acquisition mechanism among Chinese children.
(Xinhua News Agency December 22, 2004)