[Userinvolvement] NLP tools for historical texts

Martin Wynne martin.wynne at bodleian.ox.ac.uk
Wed Jan 2 12:38:45 CET 2019


Happy New Year to everyone! Here's something to think abas we plan 
activities for this year.

I'm trying to find out who around the CLARIN network and beyond has been 
working on tokenization, lemmatization and linguistic annotation for 
texts in *historical* variants of European languages, particularly from 
the seventeenth and eighteenth centuries. I am planning to propose a 
CLARIN UI workshop on the topic for 2019, and would like to find out who 
we be involved.

Relevant work could involve any of the following:

- developing bespoke tools for particular historical varieties;
- retraining annotation tools developed for contemporary languages, so 
that they work on historical varieties;
- developing lexical resources for (e.g. machine-readable historical 
dictionaries, wordlists of spelling variants, etc.);
and of course, other approaches that I haven't thought of yet.

I've already been in touch with some folks working in this area on 
English, French and German, listed below, but more contacts for these 
and other languages are welcome:

English: Paul Rayson, Alastair Baron in Lancaster, and Marc Alexander 
and Fraser Dallachy in Glasgow;
French: Gilles Souvay at ATILF in Nancy, and Marine Riguet at the Sorbonne
German: Alexander Geyken and Bryan Jurish at BBAW
and I've also talked with Pavel Stranak about retraining the LINDAT NLP 
tools for working on these and other languages.

Here in Oxford we are doing some work on tagging (mostly) eighteenth 
century texts with original orthography in French, with future work 
planned on English, German and Italian, in order to improve search 
functionality for researchers in the social sciences and humanities who 
are working with historical texts.

Please respond either to the list, or directly to me and I'll summarize 
the responses for Darja and everyone. Please feel free to forward this 
message to anyone who you think might be interested or able to help.

Best wishes,
Martin


More information about the Userinvolvement mailing list