[Userinvolvement] Manually annotated training corpora - CLARIN resource families

Pavel Stranak stranak at ufal.mff.cuni.cz
Tue Dec 4 11:13:13 CET 2018



> On 4 Dec 2018, at 10:59, Pavel Stranak <stranak at ufal.mff.cuni.cz> wrote:
> 
> Hi Jakob,
> 
> I am not sure I understand "training corpus" concept, but if you mean any manually annotated resource (which by definition can be used for supervised training), than the list is missing at the very least all the treebanks.

Of course treebanks are by far not all (again, if I understand the concept). Only the subset of the resources in our repository that explicitly mention "train" (and supposing we have no railroad data :) is 62 records: https://lindat.mff.cuni.cz/repository/xmlui/discover?query=train&rpp=10&filtertype=hasfile&filter_relational_operator=equals&filter=yes

Best,

Pavel



More information about the Userinvolvement mailing list