[Userinvolvement] Manually annotated training corpora - CLARIN resource families

Koenraad De Smedt desmedt at uib.no
Tue Dec 4 13:22:15 CET 2018


Hi,

Almost no treebanks are fully manually annotated, but a lot of treebanks are semi-manually annotated. Machine parses are often corrected as needed by annotators. In other cases machine parses are manually disambiguated. I am going to assume that those semi-manually constructed treebanks, which are indeed mentioned as training corpora, are also of interest for the current survey.

Best,
Koenraad

> On 4 Dec 2018, at 10:59, Pavel Stranak <stranak at ufal.mff.cuni.cz> wrote:
> 
> Hi Jakob,
> 
> I am not sure I understand "training corpus" concept, but if you mean any manually annotated resource (which by definition can be used for supervised training), than the list is missing at the very least all the treebanks.
> 
> -Pavel
> 
> 
> 
>> On 3 Dec 2018, at 19:06, Lenardič, Jakob <Jakob.Lenardic at ff.uni-lj.si <mailto:Jakob.Lenardic at ff.uni-lj.si>> wrote:
>> 
>> Dear all,
>> 
>> as part of the CLARIN Resource Families initiative, we are conducting a survey of manually-annotated training corpora. We have prepared the preliminary results based on the VLO and the national CLARIN repositories:
>> 
>> https://docs.google.com/spreadsheets/d/1A12KnLUboHu-SPRY5HfvpkuV6clhN_HFmp7IU_jqC9I/edit?usp=sharing <https://docs.google.com/spreadsheets/d/1A12KnLUboHu-SPRY5HfvpkuV6clhN_HFmp7IU_jqC9I/edit?usp=sharing>
>>  
>> We would appreciate it if you could add any resources and info that we have missed and correct any mistakes we have made. Note that we are looking for corpora that have been designed specifically for training language tools, such as PoS-taggers, Named-Entity recognizers, dependency parsers, etc. Comments and suggestions by email are welcome too. We are collecting feedback by December 20 after which we will prepare the report.
>>  
>> Best,
>> Jakob
>> 
>> 
>> Univerza v Ljubljani
>> Filozofska fakulteta	asist. Jakob Lenardič 
>> 
>> 
>> Oddelek za prevajalstvo / Department of translation
>> 
>> Filozofska fakulteta / Faculty of arts
>> 
>> Aškerčeva cesta 2, SI-1000 Ljubljana, Slovenija / Slovenia
>> T.: 241-1143 <tel:241-1143>
>> Jakob.Lenardic at ff.uni-lj.si <mailto:Jakob.Lenardic at ff.uni-lj.si>, www.ff.uni-lj.si <http://www.ff.uni-lj.si/>
>>  <http://www.uni-lj.si/>_______________________________________________
>> Userinvolvement mailing list
>> Userinvolvement at lists.clarin.eu <mailto:Userinvolvement at lists.clarin.eu>
>> https://lists.clarin.eu/cgi-bin/mailman/listinfo/userinvolvement <https://lists.clarin.eu/cgi-bin/mailman/listinfo/userinvolvement>
> _______________________________________________
> Userinvolvement mailing list
> Userinvolvement at lists.clarin.eu
> https://lists.clarin.eu/cgi-bin/mailman/listinfo/userinvolvement

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clarin.eu/pipermail/userinvolvement/attachments/20181204/ba1d07e7/attachment.html>


More information about the Userinvolvement mailing list