[Userinvolvement] Overview of manually annotated text corpora

Neeme Kahusk neeme.kahusk at ut.ee
Mon Jan 28 13:01:04 CET 2019


Dear Darja,

Thank you for the great report! As I was involved with collecting data for Estonian corpora, I would like to add some notes about Estonian corpora.

* There are 4 monolingual Estonian corpora listed in the Report, there should be 6 of them.  I'm not quite sure about what corpora are missing, the link to the Spreadsheet does not work in the pdf file. I used the old link I got from earlier, and there are all of them listed in.  I checked the Excluded corpora sheet as well. Yes, TempEst is unavailable at the moment, but EstAnaphora is available both via https://metashare.ut.ee and github. I added the link to the corpus in the Spreadsheet.

* Corpus of morphologically disambiguated text (http://doi.org/10.15155/1-00-0000-0000-0000-00085L) is available via Korp corcondancer as well, the link is added to the Resource Description now.

* There are listed multilingual corpora in Section 2.3.1, Estonian is missing from [2], I suppose. Estonian is also part of MULTEXT-East "1984" annotated corpus 4.0 and should be represented in this Section.

* There are no Estonian resources with "Other annotation layers" (Section 3.6), but at least these would qualify:

    - Estonian TimeML Annotated Corpus (http://doi.org/10.15155/1-00-0000-0000-0000-0015CL) - Temporal semantic annotations

    - EstAnaphora (http://doi.org/10.15155/1-00-0000-0000-0000-0016AL) - Anaphora annotation

    - Semantically disambiguated corpus of Estonian (http://doi.org/10.15155/1-00-0000-0000-0000-00081L) - word sense disambiguation


Best wishes,

Neeme Kahusk

23.01.19 20:00 Fišer, Darja kirjutas:
Dear all,

I’m happy to share the draft report of the manually annotated text corpora in the CLARIN infrastructure. If you see anything that needs to be improved or added, please let us know:
CE-2019-1384-Manually-annotated-corpora-report.pdf<https://office.clarin.eu/v/CE-2019-1384-Manually-annotated-corpora-report.pdf>
CE-2019-1384-Manually-annotated-corpora-report.docx<https://office.clarin.eu/v/CE-2019-1384-Manually-annotated-corpora-report.docx>

We will soon be adding the overview to our webpage as well.

Best,

Darja Fišer
—
Univerza v Ljubljani
Filozofska fakulteta    doc. dr. Darja Fišer, Assistant Professor
http://lojze.lugos.si/darja/

Oddelek za prevajalstvo / Department of translation

Filozofska fakulteta / Faculty of arts

Aškerčeva cesta 2, SI-1000 Ljubljana, Slovenija / Slovenia

darja.fiser at ff.uni-lj.si<mailto:darja.fiser at ff.uni-lj.si>, www.ff.uni-lj.si<http://www.ff.uni-lj.si>

[cid:part6.55D489A1.37097999 at ut.ee] <http://www.uni-lj.si>







-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clarin.eu/pipermail/userinvolvement/attachments/20190128/864aef0f/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: logo_UL.gif
Type: image/gif
Size: 3121 bytes
Desc: logo_UL.gif
URL: <https://lists.clarin.eu/pipermail/userinvolvement/attachments/20190128/864aef0f/attachment-0001.gif>


More information about the Userinvolvement mailing list