[Userinvolvement] CLARIN Tool Families: Overview of Named Entity Recognizers in the CLARIN infrastructure

Pavel Stranak stranak at ufal.mff.cuni.cz
Fri Nov 8 13:18:39 CET 2019

Hi Darja and all,

I have made a quick check of our NameTag information and I see 2 issues:
- We provide models for Czech and English, both are available online. So I added English into your list.
- the "NER categories" column. The issue is that at least for our system the recognised entities (types and sub-types) are a feature of the model (and in the end the training corpus), not the tool. It is specified in detail in the NameTag documentation that you link to, but shortly, for instance this is the schema of entities recognised by models 'czech-cnec2.0-<version>': http://ufal.mff.cuni.cz/~strakova/cnec2.0/ne-type-hierarchy.pdf. For the English models the schema is very different, because there is no training dataset with this detailed classification of entities.

So whould we duplicate lines for the tool for each model? I don't see another way to fill it in. The categories must be per model, at least for NameTag. 


> On 7 Nov 2019, at 17:31, Fišer, Darja <Darja.Fiser at ff.uni-lj.si> wrote:
> Dear all,
> Jakob and I have started the second tool survey with which we will expand the CLARIN Resource Families initiative. It is for tools for Named Entity Recognition that are provided by national CLARIN consortia. Please find the spreadsheet where we’re collecting the data below:
> https://docs.google.com/spreadsheets/d/1W7Yv-HMUt0LGsK19btJ_wMK-fmtXWTt0jmSq7OKnrcc/edit?usp=sharing <https://docs.google.com/spreadsheets/d/1W7Yv-HMUt0LGsK19btJ_wMK-fmtXWTt0jmSq7OKnrcc/edit?usp=sharing>
> We have already provided input for three tools (NameTag by LINDAT, janes-ner by CLARIN.SI, and PolDeepNer by CLARIN-PL) to show what kind of information we are looking for with this survey, so please use these entries as a template for providing information on NER tools for your consortium. In contrast to previous surveys of resources, we did not search the tools for all CLARIN countries by ourselves since tools are at the moment generally less likely discoverable through CLARIN repositories. We are kindly requesting your input by 28 November.
> Best,
> Darja and Jakob
> Univerza v Ljubljani
> Filozofska fakulteta	Assoc. Prof. dr. Darja Fišer 
> Oddelek za prevajalstvo / Department of translation
> Filozofska fakulteta / Faculty of arts
> Aškerčeva cesta 2, SI-1000 Ljubljana, Slovenija / Slovenia
> darja.fiser at ff.uni-lj.si <mailto:darja.fiser at ff.uni-lj.si>, www.ff.uni-lj.si <http://www.ff.uni-lj.si/>
>  <http://www.uni-lj.si/><logo_100.png>
> _______________________________________________
> Userinvolvement mailing list
> Userinvolvement at lists.clarin.eu
> https://lists.clarin.eu/cgi-bin/mailman/listinfo/userinvolvement

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clarin.eu/pipermail/userinvolvement/attachments/20191108/516473ba/attachment.html>

More information about the Userinvolvement mailing list