Computer-Aided Support for maintaining large & complex books in multiple languages -- (The Translation Assistant)

The specific objective for this year's research project is to provide a method, with appropriate tools and utilities, for reducing the time and cost for maintaining translations of large and complex texts. There is no tool or method available at present which reduces the translation costs and time for preparing a second or third version of a document.

Many large and complex documents are maintained in both English and French through a succession of versions or editions. For example, software products go through a sequence of releases of equivalent English and French versions. For each such release, both English and French documentation has to be prepared. Traditionally the software is developed first, then the English (or French) documentation, then a translation into French (or English). (For ease of presentation we will talk about English development and translation into French, with the understanding that the same concepts apply to the reverse). For each successive release of the software, the English documentation has to be updated and then sent for translation. The translation adds expense and delays.

Similar considerations apply to other large and complex, synchronously updated multi-lingual documents, such as legal texts. For simplicity in presentation we will continue with the software example.

Concepts and processes:

In preparing a new release, we are dealing with four distinct documents:

1. The old English document is the documentation for the previous English release.

2. The new English document is the documentation being developed for the 'soon to be released' English version of the software.

3. The old French document is the documentation for the previous French release of the software.

4. The new French document is the documentation for the 'soon to be released' French version of the software.

The process of preparing the translated new version has several phases, after the new base version is ready

• pre-processing: preparing the material (document) to be submitted for translation, including any instructions for the translators. Some of these instructions may be embedded in the document.

• translating the material as submitted, according to the instructions

• post-processing the translated materials, as received from translations. This may include merging newly translated parts with unchanged parts from the previous version. It may also include deleting embedded instructions, and doing quality assurance.

This is followed by the normal processes for preparing the version, i.e., 'publishing' the material as printed document, as Windows Help file, etc.

There are two traditional approaches to translation:

• Complete translation is the process of generating the new French document (4) from the new English document (2). This is the most expensive and time consuming process. It must be done for new software, but it is also commonly used for software updates. (Since the complete document is translated, there is no pre-processing or post-processing stage required.)

• Selective translation is the process of generating the new French document (4) by merging unchanged portions of the old French document (3) with translations of the changed parts of the new English documents (2). This approach is faster and less expensive, but it is very difficult to manage, since it is easy to miss minor changes or to make mistakes in merging.

A complicating factor is that document development and translation is often done by different groups in different locations under different contracts, so that the process must involve clean hand-overs and minimal interaction requirements.

Goal:

The overall goal is to make the translation process both faster and less expensive by using computer aided text editing both to prepare the text and to merge the translated text.

A secondary goal is to make the translation process 'safer'. Quality assurance is a major problem for large and complex documents that have to be produced fast and yet be very accurate. Reuse of previous material reduces the risk, but the process of splitting and merging can increase the risk.

An added constraint is that any technical pre- and post-processing cannot increase either the time or cost as compared to a complete translation or as compared to the approach currently used.

• There are two users with this approach: the person doing the pre- and post-processing, and the translator. The new process must have minimal impact on the translator and the translation process.

The long term objective is to discover and develop both a methodology and a technology for developing, maintaining, enhancing, and interlinking large and complex documents representing interconnected knowledge structures rather than just data. Documents in multiple languages is part of this environment.

The goal for this taxation year was to focus on the translation process. There are three technical challenges:

• To design and test an unobtrusive user interface with minimal instructions and training requirements for translators, who are not technically sophisticated, who just want to translate and do not want to be distracted by a new process. Use of word processing is assumed, and standard word processing systems must be supported. To allow the use of basic word processing skills without additional training, a special document representation and a corresponding editing method had to be designed for selective translation.

• To design and test a user interface for the pre- and post-processing. The interface should be usable by the technical writer who develops the new version, or by the editor who reviews the new version (2).

• To design and test software which does the pre- and post-processing, i.e., which prepare the document for selective translation, and which does any required post-processing. In parallel, a document structure has to be designed for the old and new versions which support the software functions in keeping the documents aligned. This alignment is crucial in semi-automatically inserting the change and insertion requirements for the selective translation.

Scientific or technological advancement achieved

A computer-aided approach to translation was invented which has a novel method for preparing a large and complex document for translation. The document must be updated from a previous version, with a translation for the previous version. The associated software represents advances both in user interface design and in semi-automated text processing.

• Windows95 or NT software (32 bit) with a novel user interface for preparing a document for selective translation. A special text editor contains three windows. The top window contains the old English version of the document. The middle window contains the new English version of the document, and the bottom window contains the old French version of the document, which is being edited and marked up for the selective translation. The system automatically keeps the three documents aligned in equivalent places, and assists in marking changes and insertions in the new French document. Old to new English uses a character by character comparison, while Old English to old French compares and aligns on the tag structure (e.g., SGML, HTML, Folio tags).

• A novel document-based user interface for selective translation. All changes and insertions that are required to change the old French to the new French are marked in the old French document. Insertions are marked with unique insertion tags, and contain the new English text that is to be inserted. Changes are marked with special tags - including the old and new English as well as the old French text. This marked up document can be edited in any standard word processing system.

• A novel extension to text processing keeps the three documents aligned. Aligning the old and new English documents uses both the internal tag structure and a character by character comparison. Aligning the old English with the old French uses the core tag structure underlying the documents. Not all tags can be used since the languages have different sentence structures and word sequences. The richer the document, i.e., with embedded jump links and other features, the better for this comparison and alignment approach. The program automatically flags differences in tag structure, in case that the old English and old French versions are not fully equivalent. Documents have to be mapped into an SGML, HTML, or Folio tag structure to support document alignment and difference detection. Other document tag structures will be explored in the future (e.g., RTF).

Activities in the taxation year:

1. The 3 window comparison and editing component was designed and programmed in C: Loading relevant parts of the 3 documents into 3 buffers, moving through the 3 buffers while keeping track of tags to keep the documents aligned, and supporting the relevant semi-automatic editing functions for deletions, for marking insertions and changes while copying the relevant pieces from the old English and new English documents. Managing the alignment and managing the buffer structure proved difficult for large and complex documents. A number of approaches were evaluated with a series of prototypes.

2. The document for the translators was designed with SGML-like tags which carried the instructions on what to do: <change><from><to> in the tags. There was experimentation with tags so that they are easily understandable by the translators, yet sufficiently distinct from other embedded marks and tags.

Systematic experimental investigation:

including analyses and experiments, interpretation, conclusions

The first set of experiments attempted to mark up an output document fully automatically, using previously developed computer-aided editing tools (developed in previous years with SR&ED support). This experimentation failed as soon as large complex documents were used as input materials. It ends up that supposedly matching versions (old English and old French) do not fully match since last minutes changes can be made to the English version while it is in translation. As soon as the document structures do not match, human judgment is required. It was decided to discontinue the fully automatic approach, since, on anecdotal evidence, these last minute changes were quite prevalent.

The second set of experiments were based on the first prototypes of Comp_3, the 3 window system, and tried to produce snippets of text for selective translation, marking both the copy of the old French and the snippets for later insertion when the snippets would return from translation. On discussion with the translation group for the Royal Bank in Montreal, it became clear that context was important for stylistic consistency, and that the snippet approach was not satisfactory.

The third set of experiments were performed with later prototypes of Comp_3. The strategy was developed of making a copy of the old French version and marking it for deletions, insertions, and changes. In the final version only content insertions and changes are marked to go to translation. Content deletions, and tag (document structure) changes are made during the editing process.

Progress made toward the research objectives:

A novel method and appropriate software was invented for marking up a previous translated version to indicate the insertions and changes required to convert it to the new version.

After completion of the research project, the method and the beta prototype software were used to translate a user manual for the Royal Bank payroll technologies system - Images(tm) (Note: the consulting project to use the beta technology for the Royal Bank payroll application is not part of this project, but it reflects the progress.) The client, and the translation department seemed satisfied with the approach. The resulting translation needed mininal clean up to remove some of the editing marks.

Note: The general approach and the software system have not yet been successfully transferred to an inhouse document developer or editor. More research may need to be done on that part of the interface.