[Userinvolvement] Manually annotated training corpora - CLARIN resource families
Pavel Stranak
stranak at ufal.mff.cuni.cz
Tue Dec 4 11:38:51 CET 2018
> On 4 Dec 2018, at 11:13, Pavel Stranak <stranak at ufal.mff.cuni.cz> wrote:
>
> 62 records: https://lindat.mff.cuni.cz/repository/xmlui/discover?query=train&rpp=10&filtertype=hasfile&filter_relational_operator=equals&filter=yes
I forgot to also filter by type, so only 42 corpora explicitly mention train: https://lindat.mff.cuni.cz/repository/xmlui/discover?query=train&rpp=10&filtertype_0=hasfile&filter_relational_operator_0=equals&filter_0=yes&filtertype=type&filter_relational_operator=equals&filter=corpus
There are also 40 various treebanking annotation records classified as corpus: https://lindat.mff.cuni.cz/repository/xmlui/discover?query=train&rpp=10&filtertype_0=hasfile&filter_relational_operator_0=equals&filter_0=yes&filtertype=type&filter_relational_operator=equals&filter=corpus
However this list is still missing many annotated resources; e.g. this one classified as LexicalConceptualResource: http://hdl.handle.net/11234/1-1457
So you may want to look for "gold": https://lindat.mff.cuni.cz/repository/xmlui/discover?query=gold&filtertype=hasfile&filter_relational_operator=equals&filter=yes
"manually -treebank -gold" gets me 30 more records: https://lindat.mff.cuni.cz/repository/xmlui/discover?query=manually+-treebank+-gold&rpp=10&filtertype=hasfile&filter_relational_operator=equals&filter=yes
Anyway, this is approximately how I would look for any manually annotated data for training of any classifiers. I hope it helps.
I am just not quite sure why I would look for such a list :-)
-Pavel
More information about the Userinvolvement
mailing list