A Case Study of City IntroductionAbstract
Since only a limited pool of qualified native English-speaking
translators can do Chinese-English translation, it is inevitable for
native Chinese-speaking translators to translate out of their native
language. Influenced by their mother tongue, Chinese translators often
use some awkward expressions, which do not exist in English, in the
translated texts. This paper aims to explore how a comparable corpus
can be applied in Chinese-English translation to assist native
Chinese-speaking translators to make their translated texts sound
natural to native English speakers. To illustrate the point, a
comparable corpus on the subject of city introduction is constructed.
With the help of comparable corpus analysis tools, sentence length,
lexical density, and other statistics which can reflect the stylistic
features of the translated texts are derived. It is argued that a
comparable corpus which can provide examples of natural expressions in
the target language plays an irreplaceable role in terminology
extraction, awkward collocation spotting and it is also can pick up
some small errors which are often neglected by non-native
English-speaking translators such as the usage of articles.
Introduction
In
terms of the prerequisites of translators, the ideal candidates would
be the native speakers of the target language. This guideline is
followed by many translation agencies for International institutes. It
is also clearly stated in the Occupational Outlook Handbook of the U.S. Department of Labor
that the nature of translation is for the translators to put their
secondary, or passive language into their native, or active language.
However, this is not the case in Chinese-English translation. According
to Xu Meijiang (2004), a senior translator in China's Central
Translation Bureau, though some qualified native English-speaking
translators are involved in C-E translating, editing or proofreading;
large volumes of C-E translations are done by native Chinese-speaking
translators alone. The present situation will not be changed in the
near future for two reasons: first, only a limited pool of qualified
native English-speaking translators are available; second, the fee
charged by native English-speaking translators is much higher than
those of their Chinese counterparts. The statistics from Beijing Evening News
(2007) state that 60% of the translation market demand cannot be met,
and China is in desperate need for qualified C-E translators. The
present problems would be how to improve the quality of the C-E
translation done by native Chinese-speaking translators. Corpora would
be a helpful tool to arm them.
The development of computer technology and Internet make the
comparable corpus-based approach accessible. In the corpus-based
approach, two subcorpora are to be constructed:
Subcorpus A—C-E translation done by Chinese translators;
Subcorpus B—English texts on the same subject written by native English speakers.
Since equivalents can be easily extracted from aligned parallel corpora, they are extensively used in translation practice. |
First,
since computers are widely used in translators' everyday work,
electronic translation texts are available, which enables the
construction of Subcorpus A. Second, the Internet provides a huge
archive of texts written by native English speakers, storing the most
recently updated language and information on various subjects and
making the construction of Subcorpus B easier than ever before. Third,
the advancement of software engineering offers tools to process the
corpus. Customizable corpus analysis software is produced to meet
different research and study needs. Wordsmith (Scott 1996), MonoConc
(Barlow 1999) and AntConc (Laurence Anthony 2007) are the most common
corpus analysis software packages and are widely used in the fields of
literature, pedagogy, linguistics and translation studies.
Machine-readable texts and computer programs make quantitative language
study possible, offering new approaches to improve the quality of
translation.
This paper aims to examine how comparable corpora can be used to
enhance the quality of the Chinese-English translations done by
non-native English-speaking translators. To illustrate the point,
comparable corpora comprising original English texts and translated
texts into English on the subject of City Introduction are constructed
and the question of how they can help translators who are translating
out of their native language to use idiomatic expressions is examined.
The 20th century saw a dramatic change in translation
studies—a transformation from traditional prescriptive study into
descriptive study, which directly promotes the development of
corpus-based translation studies. Scholars and translation education
professionals, who used to conduct translation studies or provide
translation trainings on an intuitive basis, started to do empirical
research, relying on both original and translated texts. Therefore,
various kinds of translational corpora are constructed to meet
different needs in descriptive and practical translation studies.
It is generally acknowledged that Mona Baker is a pioneer in
corpus-based translation studies, since she was the first person to
conceive the idea of translational corpus construction and actually set
up one—the Translational English Corpus (TEC). TEC, a project funded by
the British Academy, was started in 1996 and opened to the public on
line in 1999. Translated from European languages such as French, German
and Spanish and non-European languages such as Chinese and Thai, the
texts in the corpus are taken directly from publications. Mona Baker
and other faculty members in the University of Manchester Institute of
Science & Technology (UMIST) have done translation studies on the
basis of TEC. Basically, TEC-based translation studies fall into three
categories: features of translationese; studies on translator's style;
social and cultural influence on translation.
Compared with original texts, translationese, the language of
translated texts has its own special features. Thus comparable studies
have been done to reveal the differences. Baker (1996) observed that
the translated version usually had the features of explication,
simplification, normalization and leveling out. By making a comparison
between TEC and BNC on the usage of "that" which precedes an objective
clause, Olohan and Baker (2000) found that the ratio of "that" was much
higher than that in BNC, which further demonstrated the feature of
explication. Besides simplification, explication and normalization,
Sara Laviosa (1998) added three more features—avoidance of repetitions
present in the source text, discourse transfer and law of interference,
and distinctive distribution of target-language items.
TEC was used to study the different styles of translators. By making
a comparison between the type-token ratio, sentence length and
narrative structure of the translation of Peter Bush and Peter Clark,
two British translators, Baker (2000) came to the conclusion that Clark
had a more direct style than Bush.
Cultural differences between nations are revealed through
comparisons between TEC and texts originally written in English. For
example, Laviosa (2002) showed the differences between cultural
messages by making a comparison between the news subcorpora of the
English Comparable Corpus (ECC), a corpus constructed by herself, which
included 396 articles from the Guardian and Europe Journal and the news subcorpora in TEC which included news translated from German, Slavic, Italian, etc.
Descriptive translation study lays the foundation for practical
translation studies. The universal of translation revealed in
corpus-based descriptive translation studies suggests ways translators
can make their translation sound more natural to the target language
readers. Besides, the methodologies used in descriptive translation
study are very inspiring to those involved in translation practice and
in other practical translation studies.
Whereas a wide array of different kinds of corpora has been
applied in descriptive translation studies, exploration has been made
to adapt corpora to practical translation studies. Federico Zanettin raised
the idea of using corpora in the training of translators in 1998 and
further illustrated the point by presenting an experiment in which the
Olympics corpus was used by a group of trainee translators to translate
an Italian sports article into English. Since then, scholars began to
pay attention to the role corpora could play in translation education
and new approaches were developed. Jennifer Pearson (2000) noted that
parallel corpora were very useful in the translator training
environment because they could show the trainees "how professional
translators have overcome specific translation problems." Natalie
Kübler (2000) illustrated how to use specialized and general corpora
and corpus query tools to look for term candidates and their
phraseology. Krista Varantola (2000) introduced a new type of
corpus—disposable corpora which were used as performance-enhancing
tools in the training of prospective professional translators and she
also demonstrated how to apply Wordsmith Tools in corpus analysis.
1.2 Problems in Corpus-based C-E Translation Study
Since equivalents can be easily extracted from aligned parallel
corpora, they are extensively used in translation practice. The
significant role parallel corpora play in terminology extraction is not
in dispute here. However, when focusing on Chinese-English translation
study, relying solely on parallel corpora represents a problem.
First of all, high-quality C-E translation are comparatively rare
since most C-E translations are done by native Chinese translators, who
live in a Chinese-speaking environment and have little peer support
from native English speakers. One can easily spot "unconventional" and
"creative" expressions in these translations which, in most cases,
confuse native English readers. These translations can hardly meet the
need of communication between source language writers and target
language readers. Therefore, the quality of a parallel corpus
containing poor translations as raw materials is in doubt.
Secondly, it is difficult to align a parallel corpus of high-quality
C-E translation since English is a language of hypotaxis while Chinese
is a language of parataxis. To make the translation sound natural to
native English readers, translators need to bring out the implied logic
in Chinese texts by using discourse markers or other means. Absolute
equivalence in syntactic structures does not exist. Therefore, a huge
amount of aligning work will be involved in parallel corpus compiling
since automatic construction is difficult to carry out.
Therefore, a comparable corpus, which provides samples of language
as they are used naturally by native English speakers, is extremely
useful for translators who translate out of their mother tongue. A
comparable corpus has one collection of texts written by native
speakers of the target language on the same topic of the translated
texts (city introduction is the topic in this paper). Translators can
imitate the sentence pattern and idiomatic expressions used by native
speakers.
This paper aims to illustrate the value a monolingual comparable
corpus has in Chinese-English translation practice and to demonstrate
how a comparable corpus can be used in C-E translation practice to
enhance the quality of the translation done by a non-native English
speaker. Therefore, it is a practical translation study.
In the experiment, a comparable corpus which comprises two
English subcorpora—a translated text collection and an original text
collection, is constructed. The comparable corpus is the most important
translation corpus for translators who translate out of their mother
tongue. As already mentioned it is indispensable for native Chinese
translators to be involved in C-E translation, since the ultimate goal
in their translation practice should be making the translated texts
understandable and sound natural to native target-language readers. The
aim is not easy to be achieved in C-E translation without the
participation of native English speakers. Therefore, the comparable
corpus, which serves as an English consultant, plays an irreplaceable
role in C-E translation practice.
Different from paper texts, electronic corpora can be processed
by computer automatically. In this study, three freely available
programs are used in corpus analysis, terminology extraction and corpus
construction, namely, A Corpus Worker's Toolkit (ACWT), AntConc and
GoTagger.
As the largest corpus, the Internet provides an almost unlimited
number of electronic articles updated every minute. The vast pool of
information serves well as a translation corpus resource. The
comparable corpus used in the experiment is a disposable corpus which
has two subcorpora on the same subject—City Introduction.
Two steps are involved in Subcorpus A's construction—data collecting and compilation.
In the process of data collecting, it was found that C-E translated
articles on city introduction can be obtained from several kinds of
website, including tourism websites, websites to invite investment and
local government websites. Since tourism websites, in most cases, are
commercial websites, the city introduction unavoidably has several
descriptive paragraphs and functions as an advertisement. Therefore,
most articles compiled in the corpus are from government-run websites
and mainly provide factual information. Therefore, the search
strategies involved in the data collecting process are quite
simple—downloading the city introduction pages from China's local
government websites (usually provincial capital cities' websites).
However, web pages cannot be processed by corpus analysis software
directly. The articles in html format need to be converted into txt
format. In this step, "A Corpus Worker's Toolkit" (ACWT) is used to do
the conversion. First, the web page is opened in the NoteTab. Then
HTML<—>Text Conversion tool is run to get the article in txt
form. After converting all texts into txt form, the merge file tool is
applied to obtain a single file. ACWT saves the tedious and mechanical
job of corpus compilation dramatically.
Since the articles in Subcorpus A are factual information on
cities, an English on-line electronic encyclopedia is chosen as the
source for Subcorpus B. Compared with articles in Wikipedia,
Encyclopedia Britannica and other on-line English encyclopedia texts,
in Encarta, a digital multimedia encyclopedia published by Microsoft
Corporation, are more relevant to the texts in Subcorpus A. Therefore,
five metropolitan introductions are selected. The same compilation
strategy as in Subcorpus A construction is applied here. The detailed
quantitative characteristics of the corpus are demonstrated in Table 1
| |
Number of Articles |
Tokens |
Types |
|
Subcorpus A |
22 |
28947 |
4653 |
|
Subcorpus B |
5 |
28816 |
4949 |
Table 1 Corpus Characteristics
As Table 1 shows, the two subcorpora are comparable as their sizes are
quite similar. It has been observed that some data have meaning only
when the tokens are similar, such as the type/token ratio which is the
ratio of different words to total words. Since the number of total
English words is fixed, tokens can be infinitely great, which is not
true for types (Yang Huizhong, 2002).
The corpus analysis tools introduced above are applied for
exploring the comparable corpus to get information on stylistic
features and to do terminology extraction as well as to check whether
some expressions in the translated texts sound natural to
target-language readers. The main processing steps are shown as below.
In calculating the lexical density (LD), the formula is derived
from the ACWT—Lexical Density = (Number of different words / Total
number of words) x 100. To measure two numbers here, the word counter
tool in Microsoft Word and the wordlist tool in AntConc are applied.
Then the data are put back to the formula to get the LD of the two
subcorpora.
In measuring sentence length, the formula is Sentence Length = token
/ (number of full stops + number of exclamatory marks + number of
interrogation marks). ACWT is applied in counting the punctuation
mentioned above.
| |
Full Stop |
Exclamatory Marks |
Interrogation Marks |
Sentence Length |
Lexical Density |
|
Subcorpus A |
2075 |
0 |
8 |
13.9 |
16.1% |
|
Subcorpus B |
1466 |
1 |
2 |
19.6 |
17.2% |
Table 2 Data on Stylistic Features
Besides revealing the stylistic features of the translation,
comparable corpora can be used in terminology extraction and to
demonstrate the context in which the terms occur in the native
speakers' writing. AntConc is the corpus query software used in this
process. Since the number of texts compiled in the disposable corpus is
limited, the British National Corpus (BNC) is used as a supplement to
Subcorpus B for terminology extraction. Parallel corpora and on-line
dictionaries play a complementary role in actual C-E translation
practice, in which the equivalence of the Chinese terms are looked for
in a parallel corpus or an on-line dictionary. Usually, several
candidate terms are found. Then, it is the comparable corpus that tells
which candidate term is the natural expression in the target language
and suitable to be used as well as how to combine it with other words
in the context. The following example is to illustrate the idea.
The term "公共交通" (gōng gņng jiāo tōng, literally public transport)
often occurs in city introductions. Looking for the Chinese in the
China National Knowledge Infrastructure (CNKI) on-line dictionary, a
dictionary based on parallel corpora, one may get three candidate
terms, namely, public transportation, public traffic and public
transport. Using the concordance tool in AntConc to query the three
terms in the comparable corpus, one may only find "public traffic" has
appeared in Subcorpus A, while "public transport" and "public
transportation" have occurred in Subcorpus B. The occurrences of the
two terms resembled, whereas "public transport" has 929 occurrences,
public transportation 6 occurrences and no solutions found for "public
traffic" when queried in the BNC. The ratio of occurrences of "public
transport" to "public transportation" would have been more favorable to
the latter if a U.S. corpus had been consulted. Obviously, public
traffic is an unnatural expression in English.
GoTagger07 is the tool used in part of speech tagging (POS tagging). The statistics in Table 3 is derived from the tagged texts.
| |
Subcorpus A |
Subcorpus B |
|
Determiner |
2729 |
3540 |
|
Coordinating Conjunction |
1341 |
1181 |
|
Adjective |
3500 |
3153 |
|
Noun (exclude proper noun) |
6372 |
5148 |
|
Personal Pronoun |
136 |
223 |
|
Adverb |
481 |
707 |
|
Verb, base form |
216 |
317 |
|
Verb, past tense |
844 |
1082 |
|
Verb, non-3rd ps. sing. |
299 |
274 |
|
Verb, 3rd ps. sing. Present |
572 |
562 |
|
Verb |
1931 |
2235 |
|
Verb, gerund/present participle |
586 |
451 |
|
Verb, past participle |
821 |
777 |
|
wh-determiner |
98 |
138 |
|
wh-pronoun |
18 |
53 |
|
Possessive wh-pronoun |
7 |
5 |
|
wh-adverb |
19 |
66 |
Translated texts have such a distinguished language style from
the written language that a term—translationese—was coined to describe
it. In this study, some special features of the translated texts have
been spotted. Compared with the city introductions originally written
in English in Subcorpus B, the translated city introduction tends to
generate shorter sentences with simpler sentence patterns, fewer
different words, more nouns, and fewer verbs.
First of all, translators form shorter sentences and are more likely
to use simple and compound sentences than target language writers. As
Table 2 shows, the average sentence length in Subcorpus A is 5.7 words
shorter than that in Subcorpus B. Moreover, there's a considerable
difference in the sentence patterns between the two subcorpora. As
Table 2 shows, Subcorpus A has eight interrogative sentences, while
Subcorpus B has two interrogative sentences and one exclamatory
sentence. Therefore, most wh- words are used as subordinate clause
links. As the statistics in Table 3 shows, the number of wh- words in
Subcorpus B almost doubles compared to Subcorpus A. Thus we can
conclude that more complex sentences are used in texts originally
written by native English speakers than in translations done by native
Chinese speakers.
Secondly, compared with texts in Subcorpus B, less word variety is
noted in the translated texts. However, no striking difference is
spotted. As Table 2 demonstrates, the lexical density in Subcorpus A is
only 1.1% lower than that in Subcorpus B.
Thirdly, the translated texts have more nouns and fewer verbs than
the texts originally written in English. It is observed from Table 3
that nouns (excluding proper nouns) take up 22.0% of all words in
Subcorpus A and 17.9% in Subcorpus B while predicate verbs account for
6.7% in Subcorpus A and 7.8% in Subcorpus B. The words used in
Subcorpus A are 4.1% higher in nouns and 1.1% lower in verbs.
As the three unique stylistic features of the translated texts
mentioned above showed, the sentence pattern and word variety can be
improved to make the translation sound more natural to the
target-language readers.
Native English speakers are the intended readers of the English
translation. Therefore, the basic quality that a good piece of
translation should have is that the language should sound natural to
the target-language readers. Though an easy criterion for translators
who translate into their mother tongue, it is quite a challenge for
translators who translate out of their native language. In this sense,
a comparable corpus, which provides examples of native English
speakers' expressions, can assist native Chinese translators to use
idiomatic expressions by providing the context in which terms occur in
native speakers' writings, spotting awkward collocations and
highlighting some small errors which are often overlooked by non-native
speakers, such as the use of articles.
First of all, to produce a good piece of translation with accurate
use of terminology, a corpus is an indispensable tool because it can
display the context where these terms occur in native speakers'
writing. Compared with a traditional paper dictionary, a comparable
corpus is more efficient in terminology extraction. Looking up a heavy
and thick paper dictionary is quite time-consuming. Moreover, the word
entries in the paper dictionary are fixed. Since vast amount of new
words are coined every day, the fixed paper dictionary can never catch
up with the development of society and technology. The shortcomings of
paper dictionaries are overcome by the on-line dictionaries. An
Internet-based dictionary (such as the yodao dictionary) can be updated
every day. However, almost all the examples provided by these on-line
dictionaries are extracted from C-E translations which, in most cases,
were done by native Chinese translators. Besides, since the example
sentences are queried from the Internet without careful selection, the
quality cannot be guaranteed. In contrast, the illustrative sentences
extracted from the comparable corpus based on well-selected texts
written by native target language speakers are more reliable and sound
more natural.
Secondly, together with corpus analysis tools, comparable corpora
can be applied in spotting awkward collocations in the translated
texts. In addition to choosing the suitable words, combining them is a
complicated problem. Awkward collocations are the commonly occurring
error that influences the understanding of the target language reader.
In finding these unnatural expressions, a concordancer which is
contained in most corpus analysis software would be a very effective
tool. One may simply type in the phrase that he is not quite sure about
and query in both subcorpora. If the collocation has occurred in a
similar context, then it can be used. If not, the core of the phrase is
to be typed in and the corpus based on texts written by native speaker
is queried to derive a natural expression. For example, "自然条件"
(zì rán tiáo jiàn, literally natural resources) was translated into
"natural condition (s)" in three translated texts (Suzhou, Foshan and
Shaoguan city introduction). However, no hit was returned in Subcorpus
B. Then the phrase was typed into the BNC query resulting in ten hits.
But, taking a close look into the sentences, we found that "natural
condition" means the condition which is not made or controlled by human
beings.
Thirdly, the quantitative comparison between the translated texts
and texts written by native English speakers can reveal some subtle
errors which impair the quality of the translation, but are often
overlooked by non-native English speakers. For example, articles (a,
an, the) will not influence the meaning enormously, whereas their
absence can make the text sound strange. Since articles do not exist in
Chinese, they are often forgotten by native Chinese translators. As
Table 4 shows, the ratio of articles taking up in Subcorpus A is 2.9%
lower than that in Subcorpus B. And the number of "the" in the
translated texts is considerably lower than that in the texts written
by native English speakers. That slight difference would considerably
improve the translation
.
| |
the |
a |
an |
article |
ratio |
|
Subcorpus A |
1685 |
296 |
127 |
2108 |
7.3% |
|
Subcorpus B |
2447 |
427 |
74 |
2948 |
10.2% |
Table 4 Articles
This study conducted an experiment to explore ways to use
comparable corpora in translation studies with the aim of assisting
translators who translate from their native language in order to
enhance the quality of the translated texts. By carrying out a
quantitative analysis, we acquired data which indicate the special
stylistic features of the translated texts written by
non-English-speaking translators in Subcorpus B compared with the texts
written by native English speakers. Translators can make improvements
of the different stylistic features. Furthermore, this paper argues
that the comparable corpus is an indispensable tool in terminology
extraction by showing how to use the corpus in the process. Besides,
this paper explores the ways to apply a comparable corpus in making the
translation sound natural.
As this paper has proved, comparable corpora play a significant role
in translation study and practice. However, they also have some
limitations. First, one may not find comparable texts in the target
language. For example, it is difficult to find comparable material for
fictional works usually containing many cultural elements which are
unique to a nation. Surely, one cannot find an English novel which is
comparable to the Chinese novel Dream of the Red Chamber (《红楼梦》).
Therefore, the comparable corpus is mainly useful in translating
universal topics. Second, comparable corpora are not very helpful in
translating materials in which creative expressions are required, since
they only allow translators to use expressions that already exist.
However, for native Chinese translators, parroting English speakers'
words is not a bad idea because it at least makes the translated texts
readable and understandable to the target language reader.
Although comparable corpora have some shortcomings, their potential
in translation studies is not to be underestimated. In addition to
studies on word level and syntactic level, further studies can be
carried out on the application of comparable corpora on discourse-level
translation studies. Cohesive devises such as discourse markers, theme
and rheme distribution can be studied quantitatively. Furthermore,
strategies in constructing comparable corpora using the Internet as its
source can be developed.
References
- Baker, M. (1996). Corpus-based translation studies: the challenges that lie ahead. In H. Somers (ed.). Terminology, LSP and Translation: Studies in Language Engineering, in Honour of Juan C. Sager. Amsterdam: John Benjamins.
- Baker, M. (2000). Towards a Methodology for Investigating the Style of a Literary Translator. Target 12, 241-266.
- Laviosa, S. (1998). Universals of Translation. In Mona Baker (ed.). Routledge Encyclopedia of Translation Studies. London: Routledge.
- Laviosa, S. (2002). Corpus-based Translation Studies: Theory, Findings, Applications. Amsterdam: Rodopi.
- Olohan, M. & Baker, M. (2000). Reporting "that" in translated
English: Evidence of or subliminal processes of explicitation? Across Languages and Cultures, 1(2), 141-158.
- Kübler, N. (2000). Corpora and LSP Translation. In Federico Zanettin, Silvia Bernardini, Dominic Stewart (eds.). Corpora in Translator Education. Beijing: Foreign language Teaching and Research Press.
- Pearson, J. (2000). Using Parallel Texts in the Translator Training
Environment. In Federico Zanettin, Silvia Bernardini, Dominic Stewart
(eds.). Corpora in Translator Education. Beijing: Foreign language Teaching and Research Press.
- Varantola, K. (2000). Translators and Disposable Corpora. In Federico Zanettin, Silvia Bernardini, Dominic Stewart (eds.). Corpora in Translator Education. Beijing: Foreign language Teaching and Research Press.
- Zanettin, F. (1998). Bilingual Comparable Corpora and the Training of Translators. Meta 4, 1—14.
- 陈江宏 ,
(2007),中国翻译人才缺口达60% 翻译界最缺汉译英人才,《北京晚报》, 2007/8/20。
- 徐梅江,
(2004),汉译英基本模式及其发展趋势,http://www.cctb.net/wjjg/wxb/wxbkycg/200408040002.htm,2008/4/6
- 杨惠中,
(2002),《语料库语言导论》,上海:上海外语教育出版社。
by Guangsa Jin
Peking University, China
This
article was originally published at http://accurapid.com/journal/toc.htm All
rights reserved.
|