Chinese scientists have succeeded in digitalizing all the
written Chinese characters with a four-byte coding technology which
allows ancient texts crammed with rare characters to be
printed.
 
Wang Hongyuan, the coding technology inventor, said Tuesday it
would assist people to type and search all the Chinese characters
and solve problems caused by extremely rare symbols or words which
could crop up in daily life.
 
Taking bank services as an example Wang said, "If a man's name
contained rare characters, he could have difficulty in setting up a
deposit account because the bank's computer system did not have a
sophisticated enough coding method to recognize his name."
 
With the four-byte coding technology people can type in 70,000
characters in any computer installed with a coordinated database,
Wang said. He pointed out that the original two-byte coding could
deal with just 20,000 characters.
 
Statistics show that 60 million Chinese people out of a population
of 1.3 billion have rare characters in their names.
 
Wang said that although some printing methods for rare characters
had been developed, there hadn't been a database which included
format, spelling, pronunciation and the source of the
characters.
 
Feng Zheng, an expert in Chinese language with the Beijing-based
Capital Normal University, said that the research into the Chinese
language encountered difficulties because of the lack of 
digitalized reading materials.
 
Generally speaking, he said, there'd be one character in every
1,000 Chinese characters in a single ancient book that was too rare
to be printed by the two-byte coding. This meant that many ancient
books couldn't have a digitalized version which could be open to
researchers.
 
However, Wang Hongyuan said, a database based on four-byte coding
set up 13 categories with millions of records which included almost
all ancient Chinese dictionaries, documents and files.
 
The Kangxi Dictionary, a famous Chinese textbook compiled
during the reign of Kangxi Emperor of the Qing Dynasty (1644-1911),
is being prepared for publication thanks to four-byte coding. The
dictionary is best known for including the rarest characters in the
Chinese language.
  
"Apart from its own meaning, one character also embodies the
culture and history of the user," Feng said, "We should preserve
and protect our Chinese characters by using this advanced
technology."
   
According to Wang, the four-byte coding and the coordinated
database is now the subject of 20 patent applications and has been
on trial in more than 100 Chinese and foreign universities. In the
long term the database will be used to design digitalized textbooks
on the historic characters for Chinese primary and middle
schools.
 
Currently there are 1.5 billion people using the Chinese language.
The number of people learning
Chinese worldwide is estimated to be 30 million.
(Xinhua News Agency March 29, 2006)