[Userinvolvement] Overview of manually annotated text corpora

Lenardič, Jakob Jakob.Lenardic at ff.uni-lj.si
Fri Jan 25 10:48:52 CET 2019


Dear Jan,


thank you for the exhausitve information on the corpus. We'll be preparing an updated report shortly and this will all be taken into account. I'll delete the "VAGUE" descriptor and provide a link to the agreement. About the spoken corpora -- this time around we only included corpora containing written materials, but this is good information to have nonetheless, and such corpora will be included soon in the future.


Best,
Jakob


Univerza v Ljubljani
Filozofska fakulteta    asist. Jakob Lenardič


Oddelek za prevajalstvo / Department of translation

Filozofska fakulteta / Faculty of arts

Aškerčeva cesta 2, SI-1000 Ljubljana, Slovenija / Slovenia
T.: 241-1143
Jakob.Lenardic at ff.uni-lj.si<mailto:Jakob.Lenardic at ff.uni-lj.si>, www.ff.uni-lj.si<http://www.ff.uni-lj.si/>
[Univerza v Ljubljani]<http://www.uni-lj.si/>
________________________________
From: Odijk, J.E.J.M. (Jan) <j.odijk at uu.nl>
Sent: Friday, January 25, 2019 9:32:47 AM
To: Fišer, Darja; ncf at lists.clarin.eu; userinvolvement at lists.clarin.eu
Cc: Lenardič, Jakob
Subject: RE: Overview of manually annotated text corpora

Dear Darja,


Some comments on the Dutch corpora:

  *   -LASSY-Small: it is not only available via download but also through the following treebank applications:
     *   PaQu: http://www.let.rug.nl/alfa/paqu
     *   GrETEL 3.0: https://gretel.ccl.kuleuven.be/gretel3/
     *   GrETEL 4.0 (under development): http://gretel.hum.uu.nl/gretel4/ng/home
  *   LASSY-Small also has lemma information (to be added to your table on p. 4)
  *   The license has the label VAGUE in your document, but there is a very concrete agreement document for this: https://ivdnt.org/images/stories/producten/voorwaarden/voorwaarden_lassy-klein-corpus.pdf I admit, it has been written in natural language, even in Dutch, so the predicate VAGUE surely applies but I believe more specific labels are more appropriate and a link to the agreement would be perfect. The agreement specifies restrictions because the treebank contains material from commercial publishers.
  *   Not mentioned (and probably not described separately in the VLO is the Spoken Dutch Corpus Treebank (Corpus Gesproken Nederlands, CGN Treebank), which (if I am not mistaken) has manually verified annotations (syntactic structures), also obtainable from the Dutch Language Institute for download and also searchable through the applications mentioned above under LASSY-Small. It should have the same properties for annotation as Lassy-Small in your table.
  *   I will consult some more people for some resources that were not mentioned for which I suspect (but am not sure) that they have manually verified annotations
  *   Please add page numbers in your document

Jan

From: ncf-bounces at lists.clarin.eu <ncf-bounces at lists.clarin.eu> On Behalf Of Fišer, Darja
Sent: woensdag 23 januari 2019 19:01
To: ncf at lists.clarin.eu; userinvolvement at lists.clarin.eu
Cc: Lenardič, Jakob <Jakob.Lenardic at ff.uni-lj.si>
Subject: [Ncf] Overview of manually annotated text corpora

Dear all,

I’m happy to share the draft report of the manually annotated text corpora in the CLARIN infrastructure. If you see anything that needs to be improved or added, please let us know:
CE-2019-1384-Manually-annotated-corpora-report.pdf<https://office.clarin.eu/v/CE-2019-1384-Manually-annotated-corpora-report.pdf>
CE-2019-1384-Manually-annotated-corpora-report.docx<https://office.clarin.eu/v/CE-2019-1384-Manually-annotated-corpora-report.docx>

We will soon be adding the overview to our webpage as well.

Best,

Darja Fišer
—
Univerza v Ljubljani
Filozofska fakulteta

doc. dr. Darja Fišer, Assistant Professor
http://lojze.lugos.si/darja/

Oddelek za prevajalstvo / Department of translation

Filozofska fakulteta / Faculty of arts

Aškerčeva cesta 2, SI-1000 Ljubljana, Slovenija / Slovenia

darja.fiser at ff.uni-lj.si<mailto:darja.fiser at ff.uni-lj.si>, www.ff.uni-lj.si<http://www.ff.uni-lj.si>


[cid:image001.gif at 01D4B48E.24FD1AF0]<http://www.uni-lj.si>





<http://www.uni-lj.si>

<http://www.uni-lj.si>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clarin.eu/pipermail/userinvolvement/attachments/20190125/a5fe97f2/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.gif
Type: image/gif
Size: 3121 bytes
Desc: image001.gif
URL: <https://lists.clarin.eu/pipermail/userinvolvement/attachments/20190125/a5fe97f2/attachment.gif>


More information about the Userinvolvement mailing list