[Userinvolvement] Manually annotated training corpora - CLARIN resource families

Fišer, Darja Darja.Fiser at ff.uni-lj.si
Tue Dec 4 14:41:25 CET 2018


Hi Martin,

thanks, I’ve added the column as you suggested.

Best,

Darja
—
Univerza v Ljubljani
Filozofska fakulteta    doc. dr. Darja Fišer, Assistant Professor
http://lojze.lugos.si/darja/

Oddelek za prevajalstvo / Department of translation

Filozofska fakulteta / Faculty of arts

Aškerčeva cesta 2, SI-1000 Ljubljana, Slovenija / Slovenia

darja.fiser at ff.uni-lj.si<mailto:darja.fiser at ff.uni-lj.si>, www.ff.uni-lj.si<http://www.ff.uni-lj.si>

[cid:C08C5670-010D-4B4A-A66A-800ABCD8FB2F at t-2.net]<http://www.uni-lj.si>







On 4 Dec 2018, at 13:59, Martin Wynne <martin.wynne at bodleian.ox.ac.uk<mailto:martin.wynne at bodleian.ox.ac.uk>> wrote:

Hi,

I've added the BNC Sampler, which was manually post-edited to correct
the automatically assigned part-of-speech tags. Perhaps we should add
another column for 'Annotation notes' so that we can include information
like this?

Best,
Martin

On 04/12/2018 12:22, Koenraad De Smedt wrote:
Hi,

Almost no treebanks are fully manually annotated, but a lot of
treebanks are semi-manually annotated. Machine parses are often
corrected as needed by annotators. In other cases machine parses are
manually disambiguated. I am going to assume that those semi-manually
constructed treebanks, which are indeed mentioned as training corpora,
are also of interest for the current survey.

Best,
Koenraad

On 4 Dec 2018, at 10:59, Pavel Stranak <stranak at ufal.mff.cuni.cz<mailto:stranak at ufal.mff.cuni.cz>
<mailto:stranak at ufal.mff.cuni.cz>> wrote:

Hi Jakob,

I am not sure I understand "training corpus" concept, but if you mean
any manually annotated resource (which by definition can be used for
supervised training), than the list is missing at the very least all
the treebanks.

-Pavel



On 3 Dec 2018, at 19:06, Lenardič, Jakob
<Jakob.Lenardic at ff.uni-lj.si<mailto:Jakob.Lenardic at ff.uni-lj.si> <mailto:Jakob.Lenardic at ff.uni-lj.si>>
wrote:

Dear all,

as part of the CLARIN Resource Families initiative, we are
conducting a survey of*manually-annotated training*corpora. We have
prepared the preliminary results based on the VLO and the national
CLARIN repositories:

_https://docs.google.com/spreadsheets/d/1A12KnLUboHu-SPRY5HfvpkuV6clhN_HFmp7IU_jqC9I/edit?usp=sharing_
We would appreciate it if you could add any resources and info that
we have missed and correct any mistakes we have made. Note that we
are looking for corpora that have been designed specifically for
training language tools, such as PoS-taggers, Named-Entity
recognizers, dependency parsers, etc. Comments and suggestions by
email are welcome too. We are collecting feedback by December
20 after which we will prepare the report.
Best,
Jakob


Univerza/ v Ljubljani/
Filozofska/ fakulteta/ asist. Jakob Lenardič


/Oddelek za prevajalstvo/ / /Department of translation/

Filozofska/ fakulteta/ / Faculty /of arts/

Aškerčeva cesta 2, SI-1000 Ljubljana, Slovenija / Slovenia
T.: 241-1143 <tel:241-1143>
Jakob.Lenardic at ff.uni-lj.si<mailto:Jakob.Lenardic at ff.uni-lj.si> <mailto:Jakob.Lenardic at ff.uni-lj.si>,
www.ff.uni-lj.si<http://www.ff.uni-lj.si> <http://www.ff.uni-lj.si/>
Univerza v Ljubljani <http://www.uni-lj.si/>

_______________________________________________
Userinvolvement mailing list
Userinvolvement at lists.clarin.eu<mailto:Userinvolvement at lists.clarin.eu> <mailto:Userinvolvement at lists.clarin.eu>
https://lists.clarin.eu/cgi-bin/mailman/listinfo/userinvolvement

_______________________________________________
Userinvolvement mailing list
Userinvolvement at lists.clarin.eu<mailto:Userinvolvement at lists.clarin.eu> <mailto:Userinvolvement at lists.clarin.eu>
https://lists.clarin.eu/cgi-bin/mailman/listinfo/userinvolvement


_______________________________________________
Userinvolvement mailing list
Userinvolvement at lists.clarin.eu<mailto:Userinvolvement at lists.clarin.eu>
https://lists.clarin.eu/cgi-bin/mailman/listinfo/userinvolvement

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clarin.eu/pipermail/userinvolvement/attachments/20181204/114b4404/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: logo_UL.gif
Type: image/gif
Size: 3121 bytes
Desc: logo_UL.gif
URL: <https://lists.clarin.eu/pipermail/userinvolvement/attachments/20181204/114b4404/attachment-0001.gif>


More information about the Userinvolvement mailing list