[Userinvolvement] NLP tools for historical texts
Martin Wynne
martin.wynne at bodleian.ox.ac.uk
Wed Jan 2 12:38:45 CET 2019
Happy New Year to everyone! Here's something to think abas we plan
activities for this year.
I'm trying to find out who around the CLARIN network and beyond has been
working on tokenization, lemmatization and linguistic annotation for
texts in *historical* variants of European languages, particularly from
the seventeenth and eighteenth centuries. I am planning to propose a
CLARIN UI workshop on the topic for 2019, and would like to find out who
we be involved.
Relevant work could involve any of the following:
- developing bespoke tools for particular historical varieties;
- retraining annotation tools developed for contemporary languages, so
that they work on historical varieties;
- developing lexical resources for (e.g. machine-readable historical
dictionaries, wordlists of spelling variants, etc.);
and of course, other approaches that I haven't thought of yet.
I've already been in touch with some folks working in this area on
English, French and German, listed below, but more contacts for these
and other languages are welcome:
English: Paul Rayson, Alastair Baron in Lancaster, and Marc Alexander
and Fraser Dallachy in Glasgow;
French: Gilles Souvay at ATILF in Nancy, and Marine Riguet at the Sorbonne
German: Alexander Geyken and Bryan Jurish at BBAW
and I've also talked with Pavel Stranak about retraining the LINDAT NLP
tools for working on these and other languages.
Here in Oxford we are doing some work on tagging (mostly) eighteenth
century texts with original orthography in French, with future work
planned on English, German and Italian, in order to improve search
functionality for researchers in the social sciences and humanities who
are working with historical texts.
Please respond either to the list, or directly to me and I'll summarize
the responses for Darja and everyone. Please feel free to forward this
message to anyone who you think might be interested or able to help.
Best wishes,
Martin
More information about the Userinvolvement
mailing list