[Tf-curation] lindat profile mappings to VLO

Pavel Stranak stranak at ufal.mff.cuni.cz
Fri Jul 1 23:21:02 CEST 2016


Just a clarification:

Licensing info is sometimes present in VLO, but it is sometimes not perfect, e.g. when it says that CC-BY-SA is "only for research purposes": https://vlo.clarin.eu/record?4&docId=http_58__47__47_hdl.handle.net_47_11022_47_0000-0000-7FCC-D&q=treebank&index=8&count=267

Other times it is really missing, even though it is somehow represented in CMDI, e.g. https://vlo.clarin.eu/record?6&docId=http_58__47__47_hdl.handle.net_47_11234_47_1-1469_64_format_61_cmdi&q=bsd&index=1&count=27

This is clearly not enough. But if we display the license always correctly, will that be enough? 

-Pavel




> On 1 Jul 2016, at 22:51, Pavel Stranak <stranak at ufal.mff.cuni.cz> wrote:
> 
> Francesca, this is a very interesting problem. Thanks for bringing it to our attention.
> 
> Thomas,
> 
> we should not replay here the discussions Jozef mentioned, we should read them. That being said, I need to react --shortly--. 
> 
> As for just linking to the data, we really cannot. And neither should anybody else, except probably for public domain data. Jozef is correct in the core fact that you cannot yank the data out of the context of the licensing and display it by direct link anywhere you please, just because it is convenient. We would at the very least consult this with CLIC, but read for instance the simplest of licenses, BSD 2-clause. The 2 clauses are as follows:
> 
> <quote from https://opensource.org/licenses/BSD-2-Clause>
> 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
> 
> 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
> </quote>
> 
> Other licenses contain similar quotes, I checked at least CC. I think it clearly means you should not make the data accessible without the license. We could discuss technicalities, like that linking is not redistribution, but come on, this is not a court. To me it would be clearly breaking the above license.
> 
> - So please understand that we need to follow the licenses. 
>  - It could get better soon, when we get to a consensus on how to display licensing information in VLO. Then we can re-populate the "Resource"-typed ResourceProxies without breaking the licenses. But see below my summary.
> - If there is a consensus that we should keep the resources empty instead of linking to something, we can do that. But empty resource list is bad too, since the record then looks like those hundreds of empty metadata-only legacy records in LRT Inventory. So we loose the very important fact that this record actually has data, perfectly available online.
> 
> In summary, anyway I look at it, I believe VLO cannot – and should not try to – substitute the original rich context the repository developers have been developing with their resources in mind. Even when VLO has licenses to read, it will miss bitstream metadata, data previews and what not. So de-prioritising the resource links and prioritising the Landing Page link would be the best option, in my opinion.
> 
> Cheers,
> Pavel
> 
> 
> 
> 
>> On 1 Jul 2016, at 13:22, Thomas Eckart <teckart at informatik.uni-leipzig.de> wrote:
>> 
>> Hi Francesca,
>> 
>>> Any suggestion?
>>> We would like to keep a common policy for all centers using the LINDAT
>>> software.
>> 
>>> Another thing, in the our version of LINDAT the field "Demo URL" isn't
>>> generating any resource proxy in the resulting CMDI.
>>> Maybe that would be a more suitable candidate ...
>>> 
>>> See this resource and its CMDI for reference.
>>> https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-1741
>> "Demo URL" ("A url with samples of the resource or, in the case of tools, of samples of the output.") sounds indeed like a better candidate for a resource reference. Beyond that I can't say anything about both of your problems as I am not involved in the LINDAT depositing service. Maybe someone working on the tool can jump in...
>> 
>> Best,
>> Thomas
>> 
>> 
>> 
>> 
>> 
>>> 2016-06-30 16:47 GMT+02:00 Thomas Eckart
>>> <teckart at informatik.uni-leipzig.de
>>> <mailto:teckart at informatik.uni-leipzig.de>>:
>>> 
>>>       For 2) in fact what I object most is not the lack of direct
>>>       download but
>>>       the use of what for lindat users is dubbed "project page".
>>> 
>>>   I agree. A ResourceProxy of type "Resource" should normally point to
>>>   the actual resource (like the concrete corpus, tool, webservices etc.).
>>> 
>>>       So either we give a clear semantic to that field in our repo or my
>>>       modest opinion is that I would rather see a link back to our
>>>       repo on the
>>>       resources icon if the vlo.
>>> 
>>>       Once redirected to lindat-clarin or ilc4clarin, users can see
>>>       all info
>>>       including the license and download if the have the permission.
>>>       What is your opinion of this solution?
>>>       I saw other repos do this...
>>> 
>>>   That is definitely one way to do it and already more helpful for the
>>>   end user as the status quo for "Deltacorpus 1.1".
>>> 
>>>   Nonetheless, I think that this not a good solution: the CMDI
>>>   specification allows 5 types of ResourceProxys. A reference to the
>>>   original context of the resource in its repository is already part
>>>   of it ("LandingPage"). Right now this reference may not be prominent
>>>   enough at the VLO record page (maybe we find a better solution?),
>>>   but it is already there.
>>> 
>>>   The "Resource"-typed ResourceProxys are defined (at least regarding
>>>   the upcoming CMDI 1.2 specification) as references to "A resource
>>>   that is described in the present CMDI instance, e.g. a text
>>>   document, media file or tool.". If you don't want to directly link
>>>   to the actual data, I guess omitting these ResourceProxys would be
>>>   the standard-compliant way to go.
>>>   I personally think that direct links to the data should be part of
>>>   the resource description, because it increases the usefulness of the
>>>   VLO for the end user and of the resource in the context of the whole
>>>   infrastructure.
>>> 
>>>   Best,
>>>   Thomas
>>> 
>>> 
>>> 
>>>   --
>>>   Thomas Eckart
>>>   Natural Language Processing Group
>>>   Department of Computer Science
>>>   University of Leipzig
>>>   Augustusplatz 10
>>>   04109 Leipzig, Germany
>>> 
>>> 
>> 
>> -- 
>> Thomas Eckart
>> Natural Language Processing Group
>> Department of Computer Science
>> University of Leipzig
>> Augustusplatz 10
>> 04109 Leipzig, Germany
>> _______________________________________________
>> Tf-curation mailing list
>> Tf-curation at lists.clarin.eu
>> https://lists.clarin.eu/cgi-bin/mailman/listinfo/tf-curation
> 
> _______________________________________________
> Tf-curation mailing list
> Tf-curation at lists.clarin.eu
> https://lists.clarin.eu/cgi-bin/mailman/listinfo/tf-curation



More information about the Tf-curation mailing list