[Userinvolvement] Supporting citation

Pavel Stranak stranak at ufal.mff.cuni.cz
Fri Oct 2 12:41:13 CEST 2015


Excelent, now you are getting directly to the points I think we should really further improve :)

> On 02 Oct 2015, at 11:48, Thorsten Trippel <thorsten.trippel at uni-tuebingen.de> wrote:
> 
> Dear all and Pavel (who is actually also included in all...),
> 
> First: Sorry Pavel, I did not intend to sound offensive, I second everything you said here.

No offense taken.

> Following up on that discussion: if someone wants to measure the impact of cited data, it is not easy without resolving the handles. I don't think that resolving the PID will be the solution to the citation issue, though all information may be available in the CMDI metadata you can get via the PID. Hence to support citation, we probably should recommend something to give us and human readers of a publication some additional clue that the resource is somewhere in the CLARIN community.
> 
> Lindat is actually a pretty good example: it is a certified repository, using PIDs, the system is transparent, etc.
> 
> So let me ask you (that is everyone):
> 
> 1. What would you answer to a user coming to you saying: "I used António Branco's Nexing Corpus, which I found at https://lindat.mff.cuni.cz/repository/xmlui/handle/11372/LRT-386. It has the PID http://hdl.handle.net/11372/LRT-386 and seems to have been published 2014. I don't know how to cite this.

Here I would ask: did you look at the top of the page you reference, where it says "Please use the following text to cite this item or export to a predefined format"? Why did you not use (copy & paste) it making the reference here?

> Bibtex doesn't have a @languageresource type. How do I cite it?

The simple answer is: use the "bibtex" button we provide there.

The one for this discussion is: this is a real problem. So far we do it via @misc and using note={} to include what we need there. But yes, we should definitely investigate if there is a better option.

> How do I do this in Microsoft Word?"

Exactly one of the formats we should add (MS Word bibliography XML, CSL for OpenOffice, Zotero, Mendeley, etc.). We could certainly use much richer options not only in citation (meta)data formats, but also in producing directly the required formating for simple copying of already formated rich text citation. Currently we have just that one formating directly there in the yellow box. See http://crosscite.org/citeproc/ for comparison. 

We are definitely open to the idea of extending the current "citation box" in the direction of more general citation export and formating tool. Most of the backend stuff needed for this already exists (bibutils, citeproc), but it would still be significant work. We have already started discussing the options with ERIC, but nothing has happened yet. Any suggestions, and especially offers of cooperation (i.e. dev time) are very welcome.


> 2. How do you make sure that somebody (a user, google-scholar...) reading that paper notices that the resource was provided via CLARIN?

Maybe we have some simple misunderstanding here. Look at the yellow box ("citation box") at the top of any of our repository resource page. There is a formated citation and also its export formats for Bibtex and CMDI. All of them mention the repository name. That is probably the best we can do, mention the repository/centre. Most of them have CLARIN in their name.

> And how do you make sure that also the national consortium that provides it (in this case Lindat (!)) is credited? And António and the University of Lisbon as well? In Linked Data and hypertext it might work to resolve everything on a click, but in a paper, even as PDF?

Authors and the repository/Clarin centre are taken care of by the reference in the paper's Bibliography, as explained above. authors' institutions get their credit indirectly in the Times and other university rankings that rely on the citations of the employees. At least that is my best answer. 

> 3. How do you measure the use of CLARIN resources based on such citations?

Well, the PIDs of the resources are there. We would probably have to mine for them, just like LDC mines for their catalog IDs in publications. One of the reasons we try to provide the "formated text" citation is the hope that the PID well appear in the paper and we will be able to mine for it later.

> I gave my suggestion in a previous e-mail, but I know that it is not ideal and may not solve all issues.

I am afraid that for the bibliometric part there is no ideal solution. We try to provide the first part for now: support making the correct citation. We currently have no resources to do the counting part. But one thing we have done is that we started talking to CrossRef to see if we could use their service with our generic handles, provided we send them the required metadata. There is no technical reason for it not to work, but we will see.

Best,
Pavel


> 
> Cheers
> 
> Thorsten
> 
> 
> 
> 
> 
> Am 02.10.15 um 10:33 schrieb Pavel Stranak:
>> Dear Thorsten and all,
>> 
>> thanks for using our example. Let me add a few clarifications.
>> 
>>> On 01 Oct 2015, at 21:44, Thorsten Trippel <thorsten.trippel at uni-tuebingen.de> wrote:
>>> 
>>> Dear Tomaž and all,
>>> 
>>> you are right, of course. Let's stick to the lindat example. Lindat uses handles and the metadata states the correct handle. But the URL to cite should not be the lindat site but the handle.
>> 
>> Which it clearly is. If you copy&paste the citation you use the actionable (URLufied) handle, which is incidentally the currently recommended one. Nowhere do we recommend to use other URLs for citations.
>> 
>>> And of course a resolver can then easily find out that the handle belongs to the lindat repository. But if you want to find every citation of the resource you will have to look at every handle and find out if the handle is part of lindat (or some other CLARIN centre).
>> 
>> The repository is also mentioned explicitly in the citation text, following the Force 11 Joint Declaration of Data Citation Principles examples (see below).
>> 
>>> Now it would be easy(-ier) if each CLARIN centre used its own handle prefix.
>> 
>> Which we do. Feel free to resolve the handle in the example with the parametre "Don't Redirect to URLs" at http://hdl.handle.net. In fact we use two prefixes: one for the Clarin LRT Inventory records, one for our national resources. The reason is sustainability: being able to transfer the LRT Inventory to another centre, including the PID management (the whole prefix).
>> 
>>> In this case we could just search for the hdl:PREFIX (for every CLARIN centre). However this would still not be too obvious, non-CLARIN-ingroup persons would not be able to see that it is a CLARIN resource.
>> 
>> The "hdl:" format is not really recommended for citations AFAIK. I remember Larry Lenom explaining that the reason is that the schema never got supported by browsers. Sounds sensible to me. Even though the URLified form is longer, we follow that recommendation.
>> 
>>> 
>>> I think there are two purposes for citations of resources:
>> 
>> I actually think there are a few more.
>> 
>> 1) Giving credit and citations to resource creators.
>> 
>> 2) Ensuring replicability of results. That requires citing resources directly via their PIDs (and proper versioning of the resources with new PIDs).
>> 
>> Those would be the most prominent reasons usually mentioned. For more see the Force 11 declaration: https://www.force11.org/group/joint-declaration-data-citation-principles-final
>> 
>>> 
>>> 1. automatic counting: a crawler (or google) looks at references and counts the number of references to CLARIN resources. If the handle is cited only, this crawler would have to know all CLARIN-centres prefixes and look at all handles or resolve all handles to see which ones of them point to CLARIN repositories. Though this would be possible technically, it may involve some work and performance might be tricky.
>>> 
>>> 2. making CLARIN resources visible to humans: handles are persistent, but for a human it is impossible to see a relation to CLARIN. These readers will not use a resolver to find out that it is a CLARIN resource. So we need to find a different way of attributing it to CLARIN somehow. If I say CLARIN, I would of course also include national consortia.
>>> 
>>> 
>>> One more thing about DOIs: DOIs are handles (!), see https://www.doi.org/factsheets/DOIHandle.html
>> 
>> Of course they are, it has been recognised by Clarin since begining of preparation of the PID policy. Any Clarin centre is free to use DOIs as their PIDs.
>> 
>> To sum up, we have tried to create the format of our citation to conform with the Force 11 declaration mentioned above (and endorsed by Clarin ERIC: https://www.force11.org/datacitation/endorsements) and RDA work group for Data Citation. We followed the examples published by Force 11: https://www.force11.org/node/4771, but we are of really open to rational criticism and further improvements.
>> 
>> Best,
>> Pavel
>> 
> 
> 
> -- 
> ----------------------------------------------------------------------------
> ///////// Dr. Thorsten Trippel   thorsten.trippel at uni-tuebingen.de
>   //     Seminar für Sprachwissenschaft
>  //  //  Eberhard-Karls-Universität Tübingen
> //  //   Office:  Wilhelmstr. 19 #2.17
>    //    Phone:   +49 (0)7071-29-77352
> ///////// Federal Republic of Germany
> -----------------------------------------------------------------------------



More information about the Userinvolvement mailing list