Welcome to
  LoginServices | About Albania | Career | Translation Articles | Contact Us
Bullet  Home

Bullet  Services

Bullet  About Albania

Bullet  Resources

Bullet  FAQ

Bullet  Contact Us





A collection of Albanian Grammar e-books


 
The Interpretive Model and Machine Translation
Posted on Thursday, October 18 @ 05:31:58 EDT
Topic: Translation Technology

TranslationLocalizationInterpretationDTP & Printing


 
 

For a long time, translation formed part of linguistic studies (see G. MOUNIN’s works). However, during the last few decades, it has been institutionally associated with “Language Sciences”, which represent a vast and very dynamic field in which interdisciplinarity plays a key role.

This association has led to the burgeoning of a translation science (traductology or translation studies) within the field of Language Sciences which does not deal specifically with “translation” but with “translation operations and process”, thus reflecting the change in perspective adopted to approach the study object.



Our aim is to put forward an epistemological analytical grid of the field in question i.e. the works related to the analytical study of translation and its natural processing as a prelude to machine translation or computer-assisted translation. However, delimiting a field requires one or several perspectives in order to define its axes, issues, methods and aims.

Therefore, a broad outline of the theoretical conflict between the issues of meaning and translation will first be laid down. We will then explain how this conflict transcends logical formalization. The aim of devising a theory is to set the translation pedagogy free from the “interpretive model”. Finally, these issues will be reexamined in order to reuse the data for natural language processing (machine translation and computer-assisted translation).

To set up the analytical grid, we will have recourse to three basic fields related to scientific methodology: the observation field, the hypothesizing field and the validation field (see Auroux’s works). The purpose here is not to compare the observed approaches or to express any value judgment concerning them; but to tackle them from the natural language processing perspective as a step prior to translation, because this perspective as well as its implementation are part of an “objective” process, meaning that they merely draw up assessments about specific data. In other words, we are first and foremost concerned with the observation, well sustained by descriptions and validated data, so as to put these works into perspective and to draw up a specialization field in the light of the principles introduced below.


The Methodological Choices
The methodological choices concern the perspectives selected to analyze works on translation in this study. These choices put the discipline at the crossroads of theoretical linguistics and scientific empiricism, on the basis that the effects of a theory are commensurate with the resulting application. For this reason, the study object, translation, will be tackled in a descriptive way, i.e. as it is practiced and evolves professionally. But this object must be defined by new analytical protocols and imperatively has to move away from the prevailing interpretive models.

By taking these postulates into consideration, we will only describe attested works (the corpora of translated and published texts) and use what are regarded, in these works, as empirical elements which can be subject to a “corroborative” or validation test. But this aim does not rule out the possibility of making observations of a different type using information not contained in our corpora and works.

The data for analysis can be divided into three main categories. First of all, electronic texts associated with observations. Secondly, a computerized system of hypotheses and indications. Finally, validation applications relating hypotheses and linguistic data arising from observation.

Our work is essentially based on three tools: electronic texts grouped into machine-readable corpora, work tools for observing and classifying linguistic data and corroborative tools to validate the observation results.

Corpora used in this study had to match Sinclair’s terms. The observation of linguistic data should lead to the constitution of a study object in accordance with a specific and sustained extraction protocol. Results arising from the observation must be “remarkable”, meaning that they should reveal high frequency usages and occurrences in the reference corpus.

Consequently, our attention was turned to real final works (texts, sentences, expressions, terms) and not to practices relevant to language usage (speaking, writing, memorizing). The idea was that these speech practices cannot be subject to the rigorous imperatives of data examination and that only observed works allow the application of objective procedures. But this does not mean at all that what is observed does not reveal what is happening in the speaker’s mind[1].

This separation between “data” and “practices” finds its counterpart in the field of computer science, in the separation between the “declarative” and the “procedural[2]”. For the moment, we have to decide which type of data must be observed and, specifically, which phrases and terms are potential subjects for a systematic study of translation.

Until now, our approach has been based on the empirically verified postulate that the corpus texts used for examining data represent well-formed and subsequent phrases respecting specific constraints, therefore allowing us to distinguish a discourse construction from an anarchic set of phrases lacking coherence and consistency.

This starting point is important because it puts a great deal of emphasis, in the observation and analysis, on the significance of textual linguistics in comparison with theoretical and general linguistics. This means that we are pursuing several objectives: first, recognizing a text from a series of phrases with no logical or semantic link between them, secondly, tagging the content of the text from a typological point of view (technical, journalistic, etc.) and finally, classifying the information extracted according to a previously defined protocol and linguistic criteria.

To achieve these objectives, not only must an observation methodology be adopted but results should also be expressed in appropriate language. Therefore, in a text, we must learn to observe, on the one hand, phrases according to the three levels of analysis (morphological, semantic, syntactic) and, on the other, relationships between phrases according to the discourse type (argumentative model or textual anaphora).

Once the methodology has been adopted, some work hypotheses can be made while referring to three main axes: firstly, the type of formalism used, secondly, the linguistic extension or portability[3] and finally the aim or objectives of the analysis.

Concerning the first axis, the choice was made to make the results more explicit while setting up hypotheses in a form which could be computerized, i.e. likely to be represented by an algorithm and read by a machine. This is the peculiarity of “formalization”[4] that we wanted to be specific to the constraints of the translation process.

In this respect, there are two ways of “formalizing” linguistic data: one is totally independent from the computerized tool which processes data afterwards and uses explicit instructions in the form of standard rules; the other is based on the formal possibilities of machine-readable algorithms to represent the linguistic information. However, both ways are often so complementary that we should start with the first way before tackling the second. In both cases, machine-readable linguistic formalisms are obtained at the end of the procedure.

Concerning the second axis, we decided to choose, as a starting point, a source text (ST) and, as a finishing point, a target text (TT) in order to examine, in a contrastive way, their interactions through a range of structures of varying complexity which needed to be described and extracted. Once the structure is applied to the ST, in accordance with a specific protocol, it is simply searched for and validated in the TT. Hence, this is a “source-oriented” point of view of the linguistic extension.

It should be mentioned at this point that translation studies distinguish two points of view in the practice and analysis of translations: the “source-oriented” point of view which favors the specificities and requirements peculiar to the source text (faithfulness, literality) and the “target-oriented” point of view which favors the target text (rewording, adaptation).

Concerning the third axis (the aim of the analysis), it should be noted that we already have the “inputs” and the “outputs”, i.e. we already know the results of the operation before even starting the formalization and implementation of the program because we are working on text corpora which have previously been translated and synchronized. The goal of this application is to show that the program runs in accordance with given specifications. In other words, the program implementation is mainly a validating procedure for the observation results[5].

In the light of these elements, it ought to be mentioned that in the field of machine translation (MT), the issue of “linguistic extension” is essential and requires that we dwell upon it. It can be stated as such: in a linguistic A system, the information associated with a subgroup of translation units (sentences and expressions) shows a certain regularity and coherence likely to be systematized and computerized. The question is to know whether the properties of the A system, while maintaining the same underlying coherence, can be extended to a B system in such a way that the source units of translation have adequate equivalents in the target language. If this is possible and the modifications to be introduced do not affect the internal coherence of the B system, we may then say that the A system and its subgroup of units are linguistically extensible, meaning that they are transferable by a computerized translation.

Let us take an example likely to be described adequately by a grammar in spite of its complexity and semantic ambiguity. The sentence “The Minister of Education met his Interior counterpart” can easily be translated by a human in any language. However, to be translated by a machine, its linguistic properties must be extensible to the system which will receive it. In this particular case, for example, the “the possessive phrase” (the genitive construction in Arabic) should be transferable and the ellipsis in the recurrent phrases (the minister of something) should be acceptable in both cases without major modifications. Moreover, the issue of “predication” poses thorny problems concerning “portability” in two different linguistic systems like English and Arabic.

To avoid making problems of correspondence between languages insurmountable, very detailed linguistic indications must be provided to reach the next level of formalization as a prelude to computerization. A machine-readable system of equivalences is thus a set of linguistic formulae in which every formula specifies at least one pair of phrases (see the holistic perspective of translation).

As an example, let us take a set of expressions (SES) in a source text (TEX) so that every expression (EXP) can be associated with one or several indications (IND) similar for all expressions (EXP) of the set (SES) in the text (TEX). This gives the following formula: SES = {IND, EXP,TEX}.

On the basis of this formula, an equivalent formula, valid for the target text, can be obtained: SES1 = {IND1, EXP1, TEX1}. This formula is justified with regard to a set of expressions with relevant linguistic features in common in the target text without necessarily being equivalent to those of the source text on the structural level. There is no systematic projection of the properties of one system onto another. If there is projection, it must inevitably be done in accordance with a grammatical principle whose formulation is subject to the calculation (formal or algorithmic) which underlies all expressions in the text. In this way, a linguistic property can or cannot be projected, in the same way as a system can or cannot be portable, regarding the possibility or not of translating sequences from one language to another.

By adopting this “formalist” point of view in translation, explicit criteria for the comparison of texts are laid down, each dissected and expressed in the form of adequate equations. According to this method of analyzing translation, there is no “equivalence” between languages but only “correspondence” of structures and linguistic features. As opposed to “equivalents” which can be analyzed according to the similarity criterion, “correspondents” are pairs of objects different on the form level but comparable on the function level.

The featuring of these “correspondents”, which include semantic imprecision, mainly derives from the choices made during the observation stage. Which comparison elements should be adopted? Of course, we exclude from our criteria any subjective consideration concerning the “beauty” or the “elegance” of the translation to be used for machine translation.


An Outline of the Adopted Approach
Our approach can be associated, from a theoretical point of view, with textual linguistics with significant recourse to the principle of contrastivity and formalization.

In the framework of this approach, texts taken as a study backup are classified according to the sources which have produced and distributed them (for instance a paper or an official body) and according to their denotative field based on explicit semantic considerations (for instance, texts about law or health issues).

Once the field and the type of the text have been well defined, observations focus, on the one hand, on its segmentation and on the constituents of its syntax (the “chunks”), and on the other, on the links between those constituents from a morphological and semantic point of view.

Underlying calculations ensure the validation of this approach from a theoretical and practical point of view. Thus, the choice of textual units to be analyzed and formalized must be made according to specific concepts such as those of “recurrence”, “coverage” and “precision”. Statistics is used to detect the most frequent linguistic and translational usages of a structure in a study corpus and to form the description which must tell us about the most relevant elements.

Hence, observation deals with what is immediately accessible in the phrases under study, while semantics is not tackled at this point. The use of training corpora and the induction of descriptions are at the heart of the textual approach. The main stages of analysis are the following (reasoning from particular facts to a general conclusion):

1)       Segmentation and morphological analysis;

2)       Disambiguation of morphological categories;

3)       Local and textual syntactic analysis;

4)       Analysis of functional syntactic relations.

The main difficulty of the analysis before translation is still the disambiguation of the original textual context. This difficulty is essentially related to the problem of sentence delimitation in order to eliminate the potential syntactic relations for a given type of rules (i.e. the morphosyntactic rules or “chunking rules”). This problem becomes much more salient during machine analysis of texts because difficulties resulting from the ambiguities of morphosyntactic tagging combine with those of segmentation). With current formalisms, it is difficult to automatically reduce the generation of “intrusive analyses” which will inevitably be a problem during translation (see Chanod’s works).

Nevertheless, research into textual linguistics is opening the way to an inductive process of translation. It is becoming possible to formulate inductive generalizations like those of linguistic “correspondences” which are actually observed. However, to advance research, it is imperative to implement systematically corroborative tests able to measure the validity of adopted rules.


Limits of Interpretation in Machine Translation
One of the fundamental issues regarding the translation approach is still that of principles allowing the interpretation of the meaning to be translated. The perspective adopted here for analyzing translations deems there to be a specific translation mechanism which intervenes in the interpretation of phrases and general principles associated with interpretation to be insufficient. However, this mechanism should be amended to take into consideration linguistics marks (tense, mood, linking word, verbal and nominal lexicon) contributing to the interpretation of phrases and speeches to be translated. We lay out here a general framework of the formal representation, the theory of translational formalisms, and an interpretive translation model, the model of contextual deductions, to specifically examine the question of translational equivalences. We will demonstrate how this approach could be applied to natural language processing as a prelude to translation (CAT and MT).

In fact, a few years ago, new directions in linguistics and semiotics began redefining interpretation in translation and regarded it as an act of cognition passing through a comparative process of possible equivalences. The idea of setting the record straight about interpretation in translation meets the need to adjust practical observations to these new theoretical directions.

To establish the elements of the debate, we must start with texts from Umberto Eco’s book Les Limites de l’interprétation. The author notes, in his introduction, that “some pushed too far the interpreter’s initiative that the problem today is to avoid falling in a misinterpretation”. And he later adds in his book: “All in all, to say that a text has no end does not mean that {every} act of interpretation has a happy ending”. This is why the author strives to restore a certain dialectic between the rights of the reader-translator and the rights of the translated-to-be text.

Using the message “Dear friend, in this basket brought by my slave, there are thirty figs I send you as a gift”, Umberto Eco gives a range of significations and referents, but he asserts that we do not have the right to say that the message could mean anything. It could mean a lot of things but it would be hazardous to suggest any meanings. Asserting this fact means admitting phrases have a literal meaning: “I know how heated is the controversy in this respect, but I still maintain that, within the limits of a given language, there is a literal meaning for the lexical items, the one dictionaries mention first”. Eco says we must set out to define a kind of swinging, an unstable balance, between the interpret initiative and faithfulness to the text. The functioning of a text can be understood by taking into consideration the part played by the addressee in the process of its comprehension, realization and interpretation as well as the way the text itself projects the participation of the reader.

The debate on interpretation in translation is based on two approaches: on the one hand,  searching for what the author meant to say in the text[6]; on the other, searching for what the author says in the text, regardless of his intentions, either by relying on textual coherence or on the signification systems of the addressee. However, in all cases, one must use the literal meaning to develop a translation.

Translation criticism tries to explain the reasons why the text gives the former meaning or the latter. The number of versions a translator can come up with is potentially unlimited but, at the end of this process, each one of them should be tested with respect to the textual and linguistic coherence, thus rejecting precarious or approximate translations. Therefore, a text lends itself to numerous readings without allowing all possible translations. If we cannot tell which translation is the best for a text, we can, however, tell which are incorrect. Every act of translation is a difficult transaction between the translator’s competence and the type of competence a given text needs to be translated in a rigorous and coherent way. Within the unreachable author’s intention, what he meant to say, and the arguable intention of the reader-translator, his interpretation, there is the transparent meaning of the text which refutes any inadequate or unacceptable translation.

It is difficult to determine what is wrong and what is authentic in a translation, because definitions depend on the issue in question. Nevertheless, in all cases, the condition sufficient to have an incorrect meaning is the assertion that phrases from the source text have many equivalents in the target text. Thus, translation is not erroneous because of its internal properties but due to a pretended multiple identity between the source and the target.

Therefore, the sentence “All translators love foreign languages”, for example, does not have many parallel meanings but it accepts in practice several possible translations[7]. On the other hand, it is impossible to reasonably conclude that all these equivalences are identical, structurally speaking, and regardless of the subjective perception of individuals who have produced them.

These different translations are not only different wording of the same idea. Each structure stylistically expresses a different meaning. Consequently, we cannot say that a nominal sentence and a verbal sentence convey the same idea and express the same meaning, even if the words used are identical in the two structures. We know predication is not the same in both cases because the nominal sentence emphasizes the noun whereas the verbal sentence focuses on the process or the action. To declare that two structurally different translations are equivalent to a third original structure is to simply ignore the specificities of the linguistic structures in expressing nuances and meaning subtleties.

To be convinced of the validity of these observations, “retro-translation” could be used as a discriminating criterion between translations. “Retro-translation” means, in fact, retranslating to the source language, without resorting to the original the version already translated into the target language. Translating the version translated backwards and “blindly” often allows us to notice that the equivalent structure was not the one taken as a starting point for translation, demonstrating the inaccuracy of the aforementioned translation.

The notion of “possible equivalence” is useful for a translation theory because it helps to decide which meaning interests the translator in his work and what he wants to convey through language. But we must be aware of the fact that, among possible translations, there are inevitable translations, improbable translations, and inadmissible translations.

In a sentence such as: “All translators love foreign languages”, the translator must think of the best way of rendering it in the target language. He will first think in relation to the three levels of language: morphological, semantic, and syntactic. The inevitable translation will take into consideration these levels while being linguistically correct and culturally appropriate. The improbable translation will move away from literal accuracy in an over-translation of the original or create a certain stylistic effect. Finally, the inadmissible translation will give a semantically different version of the original while being linguistically accurate.

In this regard, a distinction should be made between “semantic translation” and “critical translation”. The first is the result of the technique adopted by the translator, when faced with the linear progression of a text, of giving a certain meaning in accordance with the lexicon of its phrases, whereas the second is a metalinguistic activity aimed at describing and explaining, on the formal level, why a given text gives a given translation, with the exception of all others, however sensible they are.

An exemplary translator is not only required to be precise and meticulous but also to pay great attention to the stylistic subtleties of both his work languages according to the principle that every wording has its own meaning and aim in the linguistic system using it (the “economy of language” principle). If the exemplary translator acts as such, he will produce a consensual translation without any subjective value judgment. Otherwise, he will be compelled to search in vain for possible meanings and potential ways of rendering them.

Some translators may wonder: “why be so rigorous if the meaning is understood and conveyed?” However, such translators, indulgent or careless depending on each case, will not be exemplary translators, because they seek the exact meaning and the inevitable translation, the one likely to be taken and modeled for language natural processing. But how can we achieve this goal when faced with so many readings and interpretations?

According to the semiotician Peirce, the meaning interpretation is an action involving the cooperation of three subjects: the sign (ex.: the word rose), its object (the real tangible flower) and its interpretant (the concept of the red flower). What is important in the definition of Peirce is that it does not take into consideration an interpreter or conscious subject. Hence, it should be remembered, in accordance with the analyses of Peirce and Eco, how important the distinction between the meaning system (the sign system) and the process of communication is (requiring the presence of an interpreter).

The meaning system is a series of elements with a combinatory rule governing the disposition of elements between them (its syntax). The acceptable sequences of a syntactic system associated with another system can be transferable from one language to another (ex.: w+a+t+e+r = water = “drinkable transparent liquid” is transferable in any language in the world without recourse to human interpretation).

In a semiotic system, any content could become a new expression likely to be interpreted or translated by another expression in another language. Abduction is a form of inference which tries to accurately interpret the meaning of a phrase and to establish a rule using a word and its context. Recognizing a series of words as a coherent sequence (i.e. as a text) means finding a textual theme able to create a coherent connection between different data with no link between them. The identification of a textual theme is an example of an abduction. Every translator makes abductions to choose between numerous possible readings of a text. The economy of language criteria compel us to always choose the easiest option in the absence of any other selection tool.



By Mathieu Guidere
Master in Arabic language and literature and Ph.D in Translation Studies and Applied Linguistics from the University of Paris-Sorbonne,
Lyon 2 University - France
Saint-Cyr Research Centre, France
mathieu.guidere@univ-lyon2.fr
http://perso.univ-lyon2.fr/~mguidere






 


 
· More about Translation Technology
· Articles by Genta


Most read story about Translation Technology:
Machine Translation Vs Human Translation

Average Score: 0
Votes: 0

Please take a second and vote for this article:

Excellent
Very Good
Good
Regular
Bad


 Printer Friendly Printer Friendly

"Login" | Login/Create an Account | 0 comments
The comments are owned by the poster. We aren't responsible for their content.

No Comments Allowed for Anonymous, please register
Services | About Albania | Career | Resources | Links | Contact Us
Copyright@2006. Terms & Conditions, Privacy
Translation, Localization and Interpretation ServicesAnaliza - Statistikat e Web-it Shqiptar