[Userinvolvement] Manually annotated training corpora - CLARIN resource families

Pavel Stranak stranak at ufal.mff.cuni.cz
Tue Dec 4 11:38:51 CET 2018



> On 4 Dec 2018, at 11:13, Pavel Stranak <stranak at ufal.mff.cuni.cz> wrote:
> 
> 62 records: https://lindat.mff.cuni.cz/repository/xmlui/discover?query=train&rpp=10&filtertype=hasfile&filter_relational_operator=equals&filter=yes

I forgot to also filter by type, so only 42 corpora explicitly mention train: https://lindat.mff.cuni.cz/repository/xmlui/discover?query=train&rpp=10&filtertype_0=hasfile&filter_relational_operator_0=equals&filter_0=yes&filtertype=type&filter_relational_operator=equals&filter=corpus

There are also 40 various treebanking annotation records classified as corpus: https://lindat.mff.cuni.cz/repository/xmlui/discover?query=train&rpp=10&filtertype_0=hasfile&filter_relational_operator_0=equals&filter_0=yes&filtertype=type&filter_relational_operator=equals&filter=corpus

However this list is still missing many annotated resources; e.g. this one classified as LexicalConceptualResource: http://hdl.handle.net/11234/1-1457

So you may want to look for "gold": https://lindat.mff.cuni.cz/repository/xmlui/discover?query=gold&filtertype=hasfile&filter_relational_operator=equals&filter=yes

"manually -treebank -gold" gets me 30 more records: https://lindat.mff.cuni.cz/repository/xmlui/discover?query=manually+-treebank+-gold&rpp=10&filtertype=hasfile&filter_relational_operator=equals&filter=yes

Anyway, this is approximately how I would look for any manually annotated data for training of any classifiers. I hope it helps.

I am just not quite sure why I would look for such a list :-)

-Pavel


More information about the Userinvolvement mailing list