[Userinvolvement] Manually annotated training corpora - CLARIN resource families
Fišer, Darja
Darja.Fiser at ff.uni-lj.si
Tue Dec 4 14:41:25 CET 2018
Hi Martin,
thanks, I’ve added the column as you suggested.
Best,
Darja
—
Univerza v Ljubljani
Filozofska fakulteta doc. dr. Darja Fišer, Assistant Professor
http://lojze.lugos.si/darja/
Oddelek za prevajalstvo / Department of translation
Filozofska fakulteta / Faculty of arts
Aškerčeva cesta 2, SI-1000 Ljubljana, Slovenija / Slovenia
darja.fiser at ff.uni-lj.si<mailto:darja.fiser at ff.uni-lj.si>, www.ff.uni-lj.si<http://www.ff.uni-lj.si>
[cid:C08C5670-010D-4B4A-A66A-800ABCD8FB2F at t-2.net]<http://www.uni-lj.si>
On 4 Dec 2018, at 13:59, Martin Wynne <martin.wynne at bodleian.ox.ac.uk<mailto:martin.wynne at bodleian.ox.ac.uk>> wrote:
Hi,
I've added the BNC Sampler, which was manually post-edited to correct
the automatically assigned part-of-speech tags. Perhaps we should add
another column for 'Annotation notes' so that we can include information
like this?
Best,
Martin
On 04/12/2018 12:22, Koenraad De Smedt wrote:
Hi,
Almost no treebanks are fully manually annotated, but a lot of
treebanks are semi-manually annotated. Machine parses are often
corrected as needed by annotators. In other cases machine parses are
manually disambiguated. I am going to assume that those semi-manually
constructed treebanks, which are indeed mentioned as training corpora,
are also of interest for the current survey.
Best,
Koenraad
On 4 Dec 2018, at 10:59, Pavel Stranak <stranak at ufal.mff.cuni.cz<mailto:stranak at ufal.mff.cuni.cz>
<mailto:stranak at ufal.mff.cuni.cz>> wrote:
Hi Jakob,
I am not sure I understand "training corpus" concept, but if you mean
any manually annotated resource (which by definition can be used for
supervised training), than the list is missing at the very least all
the treebanks.
-Pavel
On 3 Dec 2018, at 19:06, Lenardič, Jakob
<Jakob.Lenardic at ff.uni-lj.si<mailto:Jakob.Lenardic at ff.uni-lj.si> <mailto:Jakob.Lenardic at ff.uni-lj.si>>
wrote:
Dear all,
as part of the CLARIN Resource Families initiative, we are
conducting a survey of*manually-annotated training*corpora. We have
prepared the preliminary results based on the VLO and the national
CLARIN repositories:
_https://docs.google.com/spreadsheets/d/1A12KnLUboHu-SPRY5HfvpkuV6clhN_HFmp7IU_jqC9I/edit?usp=sharing_
We would appreciate it if you could add any resources and info that
we have missed and correct any mistakes we have made. Note that we
are looking for corpora that have been designed specifically for
training language tools, such as PoS-taggers, Named-Entity
recognizers, dependency parsers, etc. Comments and suggestions by
email are welcome too. We are collecting feedback by December
20 after which we will prepare the report.
Best,
Jakob
Univerza/ v Ljubljani/
Filozofska/ fakulteta/ asist. Jakob Lenardič
/Oddelek za prevajalstvo/ / /Department of translation/
Filozofska/ fakulteta/ / Faculty /of arts/
Aškerčeva cesta 2, SI-1000 Ljubljana, Slovenija / Slovenia
T.: 241-1143 <tel:241-1143>
Jakob.Lenardic at ff.uni-lj.si<mailto:Jakob.Lenardic at ff.uni-lj.si> <mailto:Jakob.Lenardic at ff.uni-lj.si>,
www.ff.uni-lj.si<http://www.ff.uni-lj.si> <http://www.ff.uni-lj.si/>
Univerza v Ljubljani <http://www.uni-lj.si/>
_______________________________________________
Userinvolvement mailing list
Userinvolvement at lists.clarin.eu<mailto:Userinvolvement at lists.clarin.eu> <mailto:Userinvolvement at lists.clarin.eu>
https://lists.clarin.eu/cgi-bin/mailman/listinfo/userinvolvement
_______________________________________________
Userinvolvement mailing list
Userinvolvement at lists.clarin.eu<mailto:Userinvolvement at lists.clarin.eu> <mailto:Userinvolvement at lists.clarin.eu>
https://lists.clarin.eu/cgi-bin/mailman/listinfo/userinvolvement
_______________________________________________
Userinvolvement mailing list
Userinvolvement at lists.clarin.eu<mailto:Userinvolvement at lists.clarin.eu>
https://lists.clarin.eu/cgi-bin/mailman/listinfo/userinvolvement
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clarin.eu/pipermail/userinvolvement/attachments/20181204/114b4404/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: logo_UL.gif
Type: image/gif
Size: 3121 bytes
Desc: logo_UL.gif
URL: <https://lists.clarin.eu/pipermail/userinvolvement/attachments/20181204/114b4404/attachment-0001.gif>
More information about the Userinvolvement
mailing list