CLIN2017 Shared Task: Translating Historical Text

Historical texts pose a challenge for automatic text processing tools because the words in the text are spelled in a different way in comparison with their modern equivalents and because spelling may be inconsistent. One method of solving this problem is to translate the historical texts to a version in current language and then apply the text processing tools to this modern version. This shared task focuses on this translation task, in particular applied to documents written in seventeenth-century Dutch.


Updated BLEU evaluation script (run as: bleu -s processed gold). BLEU scores of runs Ljubljana-1, Utrecht-1 and Helsinki-4 changed slightly
The results of the shared task are available as well as the overview talk slides presented at the conference CLIN2017.
Task description

Seventeenth-century Dutch is similar to modern Dutch but the differences between the two are large enough to cause problems for automatic text processing tools. For example, the sentence:

De honger nu wert swaer in dat lant
contains three old words which are currently spelled differently (marked in red). The task is to translate such sentences to their modern equivalents, for example to:
De honger nu werd zwaar in dat land

You may use any method for performing the translation, provided that it is automatic and can process large texts in a reasonable time. For example, a translation lexicon would be very useful for this task. Such a lexicon exists (select "Get lemma") but it returns modern lemmas rather than modern word forms and leaves disambiguation to the user.

The translations can be used for followup linguistic processing, for example for assigning part-of-speech tags to the words. For this purpose, it is important that it is clear to which word in the original text a translated word corresponds. This can be achieved with additional meta data, for example encoded in XML, or by keeping the same word order in the translation as in the original text, i.e. by performing a word-by-word translation.

Software and data

We provide software and access to data which can be used for starting with the shared task:

This includes the following software:

Note that the target language for the shared task is 21st century Dutch. The shared task data contains a 19th century text as target because that was the closest available text in a parallel pair. You are free to use other texts.

Participants can use the software and data as a base for developing their system. Additional test data sets will be released two weeks before the conference for testing the final version of the systems.


September 2016
Call for participation
Monday 30 January 2017 12:00
Release of the test data sets
Friday 3 February 2017 23:59
Deadline for the submission of test results. Send them by email to Erik Tjong Kim Sang: erikt(at) or
Friday 10 February 2017 10:30
Overview talk at CLIN2017. Poster presentations of participants.
May 2017
Submission of overview paper to CLIN Journal


Erik Tjong Kim Sang (Meertens Institute Amsterdam)


