[Tf-curation] l/a synthesis - unavailable

Krister Lindén krister.linden at helsinki.fi
Sun Feb 21 19:29:06 CET 2016


Matej,

RES indicates that the resource is somehow available to an individual if 
the person fulfills the declared conditions. There is no specific limit 
on how cumbersome the conditions can be for the resource still to be 
classified as RES. However, if you have to be prepared to negotiate the 
license and clear the rights to get access, a resource is normally 
classified as unavailable, because you can always negotiate.

In the end, I guess it comes down to how much effort we wish to spend on 
distinguishing not yet negotiated licenses from licenses that have not 
been properly labeled, but for now I would be inclined to keep the 
unavailable resources in the Unspec category as well.

Regards,
Krister

[PS. For an alternative solution you could argue as follows:

If you know that there is currently no license for a resource, from a 
metadata point of view it could perhaps be interpreted as "RES *". In 
that case, it is not a question of the license having deficient legal 
metadata (which you wish to indicate by Unspec), but on the contrary, 
you know quite well that the license is not yet negotiated.

In addition, the asterisk indicates non-standard conditions, i.e. some 
other condition than the conditions we already have tags for. The need 
to negotiate is a rather non-standard condition so in that case I guess 
it is safe to say that "the resource is available to an individual on 
non-standard conditions (= RES *)".

As the asterisk refers to a real license, this should be acknowledged in 
the license type field, e.g. with an exclamation symbol indicating a 
"not yet negotiated license type", in which case the asterisk refers to 
non-standard conditions in an unknown license so as not to give the 
impression that this is a pre-negotiated CLARIN RES license with 
non-standard conditions.]



On 19.2.2016 19:03, Durco, Matej wrote:
> As usual I have to have the last word ;)
>
> This is one more request for comment mainly for Krister and Penny:
>
> Only while processing the suggested mappings we encountered that Krister and Penny partly mapped to unavailable and undefined.
> While I think we can safely reduce undefined to  unspecified (meaning any information about l/a is missing)
> it seems that you  (Krister, Penny) would like to see it emphasized if the metadata explicitely states that the resource is "not available" at all.
> However I thought we agreed to have as main categories just the 4 (PUB/ACA/RES/Unspec).
> So the question is would you propose to have it as a 5th main category,
> or wouldn't it be enough to map it to RES;Other = RES*?
> I really wouldn't like to see such a unnice category as "totally unavailable" being pushed into the attention of the users. I would hope that RES* actually covers the case.
>
> Thank you for commenting on this.
>
> best,
> Matej
>
>
> -----Ursprüngliche Nachricht-----
> Von: Twan Goosen [mailto:twan.goosen at mpi.nl]
> Gesendet: Freitag, 19. Februar 2016 16:13
> An: Durco, Matej <Matej.Durco at oeaw.ac.at>; Krister Lindén <krister.linden at helsinki.fi>; Penny Labropoulou <penny at ilsp.gr>; 'Sander Maijers' <sander at clarin.eu>
> Cc: 'Menzo Windhouwer2' <menzo.windhouwer at meertens.knaw.nl>; 'Thomas Eckart' <teckart at informatik.uni-leipzig.de>; Ostojic, Davor <Davor.Ostojic at oeaw.ac.at>; tf-curation at lists.clarin.eu; Sugimoto, Go <Go.Sugimoto at oeaw.ac.at>; 'Dieter Van Uytvanck' <dieter at clarin.eu>
> Betreff: Re: AW: l/a synthesis
>
> Hi everyone,
>
> On 12/02/16 13:55, Durco, Matej wrote:
>> [..]
>> You will be able to explore the mapping in the usual way on Minerva
>> beginning next week but also in the new much more intuitive way as Twan has now implemented in the beta release.
>> The facet will feature only the four main categories for search, but
>> the secondary categories will be displayed with individual records.
>> All represented by dedicated icons, giving nice visual clue.
>> I guess Twan will introduce this in detail next week.
> We managed to deploy the beta version, just in time to not break Matej's promise ;) You can find it at <http://beta-vlo.clarin.eu/>. Notice that the importer is still running as I'm writing that, so more records will become available between now and ~5pm CET.
>
> To guide you a little bit, here are some demonstration queries/pages:
> * Records open for public or academic access:
> <http://beta-vlo.clarin.eu/search?fqType=availability:or&fq=availability:PUB&fq=availability:ACA>
> * Records that provide their exact licence:
>       <http://beta-vlo.clarin.eu/search?q=license:%7B*+TO+*%5D>
> * Example of a record that provides a nice amount of legal information:
> <http://beta-vlo.clarin.eu/record?docId=http_58__47__47_hdl.handle.net_47_11858_47_00-203C-0000-0023-8323-1>
>
> Two functional TODOs still to be completed before the stable release:
> - Boosting of public/academic records in search results (this could also reduce the slightly awkward sight of a front page filled with "?" level
> records)
> - Addition of missing icons for secondary availability 'laundry tags'
> (see e.g.
> <http://beta-vlo.clarin.eu/record?q=availability:FF&docId=http_58__47__47_urn.fi_47_urn_58_nbn_58_fi_58_lb-201403262>)
>
> Please play around with the availability level selector, have a look at the availability/licence information in the search results and on the record pages and the way they are presented. The descriptions shown for the various 'tags' (as tooltips on the search page or shown next to the icons on the record page) can still be improved, and so can the icons and of course the mapping itself. So, to quote Matej:
>> Thank you again for now
>> and we are looking forward to your comments and observations on the new release next week.
> And have a nice weekend :)
>
> Best,
> Twan
>
> P.S. you can find a 'complete' list of changes in this version at <https://github.com/clarin-eric/VLO/blob/development/CHANGES.txt> if you're interested.
>>
>> Regarding the facet labels and definition, we now decided to stay with
>> the "availability" label, as it seems to reflect better the user's point of view.
>> This will be then also actually the only facet with explicit label,
>> because it will be the among the "search facets". The other facets
>> will be processed in a more verbose form. You will see next week ;)
>>
>>
>> Best,
>> Matej
>>
>> -----Ursprüngliche Nachricht-----
>> Von: Krister Lindén [mailto:krister.linden at helsinki.fi]
>> Gesendet: Freitag, 05. Februar 2016 16:09
>> An: Penny Labropoulou <penny at ilsp.gr>; Durco, Matej
>> <Matej.Durco at oeaw.ac.at>; 'Twan Goosen' <twan.goosen at mpi.nl>; 'Sander
>> Maijers' <sander at clarin.eu>
>> Cc: 'Menzo Windhouwer2' <menzo.windhouwer at meertens.knaw.nl>; 'Thomas
>> Eckart' <teckart at informatik.uni-leipzig.de>; Ostojic, Davor
>> <Davor.Ostojic at oeaw.ac.at>; tf-curation at lists.clarin.eu; Sugimoto, Go
>> <Go.Sugimoto at oeaw.ac.at>; 'Dieter Van Uytvanck' <dieter at clarin.eu>
>> Betreff: Re: l/a synthesis
>>
>> Since we are interpreting what others have written, we need to take a conservative view not to give the impression that the data is more freely available than it is. We are currently in the process of interpreting the usage conditions in the metadata by going from "unspecified" to something slightly more informative. We therefore interpret the resources to be as freely accessible as we safely can. If the metadata provider is unhappy with how narrowly his legal metadata is interpreted in the VLO, he can set the elements more exactly in his CMDI metadata.
>>
>> (Non-CLARIN Centers have the same opportunity. For a CLARIN Center,
>> CLARIN ERIC has added responsibility for auditing the quality of the
>> relationship between the licenses and the CMDI metadata.)
>>
>> Penny's advice to be careful is good but it actually works in the other direction based on the principle that one cannot give more rights than one has. Without additional info, "for research" can be narrowly approximated by at least +LRT without saying whether the data is available for all (PUB), for a trusted community (ACA) or upon personal request to the owner (RES). However, using only +LRT would likely keep the resource in the "unspecified" main category, whereas using ACA is a rather safe bet stating that the data is available for research while also assuming that the downloader needs to be identified to access the data, which in most cases is unfortunately still true for research data that is not explicitly licensed with one of the open or public licenses.
>>
>> If we have additional knowledge that +ID is not required by the license of the resource, then Penny's suggestion for PUB+research can be narrowly approximated with tags saying that at least PUB+LRT is safe to assume.
>>
>> Regards,
>> Krister
>>
>>
>> On 5.2.2016 11:19, Penny Labropoulou wrote:
>>> Dear Matej and Krister,
>>> If indeed the VLO harvests only from CLARIN centres, and following Krister's explanations, ok, let's have the tags explicit.
>>> But I have the feeling that the VLO also harvests from other sources, and these may not all include CMDI metadata or, even more, a licensing category or even a licence (which should be imposed at least for new data!); if this is the case, then we are actually interpreting the providers' metadata, often just free text statements, in which case we should be more careful, I think. If the providers simply state "for research" and we interpret that as ACA (as done in our excel), then the ID tag may be more than what the original providers ask for; if asked, they might have gone for PUB +research. In any case, the users are directed through the VLO to where the resource itself is made available, and there, the users will have to accept whatever licensing conditions the provider asks for and it's up to the source distributor to enforce it. If the source distributor is a CLARIN centre, the ID will be imposed by our own policy, and it's clearly stated, as Krister says, in the agr!
 ee!
>>    ment tem
>> plate.
>>> I would like to see the ID being used as a way of facilitating access to resources for researchers in a trusted federation such as CLARIN, rather than a way of discouraging access to resources.
>>> As said, if the VLO harvests only from CLARIN centres, just disregard all the above.
>>> Best,
>>> Penny
>>>
>>> -----Original Message-----
>>> From: Krister Lindén [mailto:krister.linden at helsinki.fi]
>>> Sent: Friday, February 05, 2016 3:55 AM
>>> To: Durco, Matej <Matej.Durco at oeaw.ac.at>; Penny Labropoulou
>>> <penny at ilsp.gr>; 'Twan Goosen' <twan.goosen at mpi.nl>; 'Sander Maijers'
>>> <sander at clarin.eu>
>>> Cc: 'Menzo Windhouwer2' <menzo.windhouwer at meertens.knaw.nl>; 'Thomas
>>> Eckart' <teckart at informatik.uni-leipzig.de>; Ostojic, Davor
>>> <Davor.Ostojic at oeaw.ac.at>; tf-curation at lists.clarin.eu; Sugimoto, Go
>>> <Go.Sugimoto at oeaw.ac.at>; 'Dieter Van Uytvanck' <dieter at clarin.eu>
>>> Subject: Re: l/a synthesis
>>>
>>> Dear Matej,
>>>
>>> Regarding the explicitness of tags: In the current Agreement templates for a normal ACA resource, the federated login, attribution and no redistribution conditions are made explicit. It would therefore be better to reflect this in the tags in the VLO for this automated update.
>>>
>>> Not having the tags explicitly may cause liability for CLARIN in some cases as it encourages unintended usage, whereas being slightly too strict in the labeling will have no legal implications. People will only be pleasantly surprised that some resources are more widely useable than they imagined.
>>>
>>> When CLARIN Centers provide their own licenses with tags already marked as CMDI components, they will be responsible for the labeling of their own licenses. If they do not e.g. require login for their particular brand of licenses "for (teaching, education and) research-purpose", they may leave out the ID tag, but if the ID tag is only a non-explicit assumption via the guidelines, the Centers will not even be able to leave it out, as it should always be implicitly read into the tag set.
>>> (Note that the assessment of the license labeling should be part of
>>> the regular CLARIN Center assessment procedure.)
>>>
>>> The "other" tag is there to draw the attention of the user to peculiar but relevant usage conditions similar to "only to be used on Tuesdays"
>>> or the like. We can't have a tag for everything, but an asterisk is an indication that this license has conditions out of the ordinary. We are aware that recognizing what is out of the ordinary may be non-trivial.
>>>
>>> Regards,
>>> Krister
>>>
>>>
>>> On 3.2.2016 18:28, Durco, Matej wrote:
>>>> Dear Krister,
>>>>
>>>> thank you a lot for the extensive response, this is really very helpful!
>>>>
>>>> In my view, your clarifications regarding NC/ACA and derivative data should definitely find their way into the public information about L/A [1].
>>>>
>>>> The hint that the three main categories imply certain subcategories (ACA => ID;BY;NORED) is also very helpful.
>>>> I just wonder, if we want to make it *explicit* in the VLO (i.e. add for every resource with ACA tag, also the ID, BY and NORED attributes), or just explain it (somewhere under [1]) and leave that implicit in VLO.
>>>>
>>>> In the list you proposed to map a few "restricted ..." values with "*"
>>>> (or other), which seems a bit counterintuitive, but I guess this has to do with the special meaning of "RES"... ?
>>>>
>>>> The next steps:
>>>> We right now process the (Krister's) mapping into a normalization map as used by VLO.
>>>> We will apply it on our Minerva VLO instance first and let you inspect the new mappings probably  on Monday.
>>>> We will also tentatively try to map from the dc-concepts
>>>> (dcterms:rights, dcterms:accessRights, dcterms:license), to see if we can get a better coverage (the profile coverage analysis [2] suggests so) After a few days validation and comment period, we would apply the mapping in the main VLO instance (and roll out with version 3.4).
>>>>
>>>> There is one more thing, we would like to have feedback on, especially from CLIC. That is the labels and definitions for the l/a related facets.
>>>> But I spare that for a separate email.
>>>>
>>>> Thank you for all the input so far.
>>>>
>>>> Best,
>>>> Matej
>>>>
>>>> [1] https://www.clarin.eu/content/license-categories
>>>> [2]
>>>> https://docs.google.com/spreadsheets/d/1eeOr0ShOWxdY8BLzp62LDyfGgHo0
>>>> g
>>>> Z
>>>> 95Myw0qauzLxU/edit#gid=0&vpid=A1
>>>>
>>>>
>>>> -----Ursprüngliche Nachricht-----
>>>> Von: Krister Lindén [mailto:krister.linden at helsinki.fi]
>>>> Gesendet: Sonntag, 31. Jänner 2016 22:26
>>>> An: Durco, Matej <Matej.Durco at oeaw.ac.at>; Penny Labropoulou
>>>> <penny at ilsp.gr>; 'Twan Goosen' <twan.goosen at mpi.nl>; 'Sander Maijers'
>>>> <sander at clarin.eu>
>>>> Cc: 'Menzo Windhouwer2' <menzo.windhouwer at meertens.knaw.nl>; 'Thomas
>>>> Eckart' <teckart at informatik.uni-leipzig.de>; Ostojic, Davor
>>>> <Davor.Ostojic at oeaw.ac.at>; tf-curation at lists.clarin.eu; Sugimoto,
>>>> Go <Go.Sugimoto at oeaw.ac.at>; 'Dieter Van Uytvanck'
>>>> <dieter at clarin.eu>
>>>> Betreff: Part I: Re: AW: AW: [Tf-curation] License/Availability was
>>>> WG: Re: LicenseAvailabilityMap.xml in
>>>> vlo/trunk/vlo-commons/src/main/resources – CLARIN Trac
>>>>
>>>> Dear all,
>>>>
>>>> It seems I could have responded in one long email, but I chose to answer in three different parts and therefore I end up confirming some of the things here that Penny said in the next message, while also adding some explanations, but here goes.
>>>>
>>>> Part I:
>>>>
>>>> On 21.1.2016 17:04, Durco, Matej wrote:
>>>>> ACA vs. NC
>>>>> as you rightly commented in the gsheet
>>>>>
>>>>> 1)
>>>>>
>>>>> * ad: PUB/ACA/RES yes, the goal is to have one of this 3 categories
>>>>> assigned to each record/resource. ... the way it is solved in
>>>>> META-SHARE profiles brought me to the idea to decompose to license
>>>>> categories. Namely in META-SHARE profile the licenceInfo is quite
>>>>> rich: There is a licence-element and the repeated restrictionsOfUse
>>>>> element, so each record has more than enough information to
>>>>> correctly map to both the main categories and the optional ones.
>>>>> (see example Helsinki corpus [1]) Therefore I believe we can (and
>>>>> have to) be conservative in the mapping and can avoid adding
>>>>> uncertain
>>>>> information: Correct me if I am wrong but "non-commercial use" can
>>>>> be safely mapped to the License category "NC",
>>>> Yes it can. There is currently a legal debate going on about what non-commercial means exactly, because it is a fuzzy concept, but whenever that discussion arrives at a conclusion, which may be as soon as the new directive for some kind of research exception emerges, that is the definition we will adopt.
>>>>
>>>>> however mapping it to the main category PUB ACA or RES is
>>>>> problematic without more information. Long story short, even though
>>>>> we cannot map each individual value in the normalization map to one
>>>>> of the main categories, in the end each every record/resource (that
>>>>> provides the appropriate information) will be assigned to one of the three.
>>>> Agree.
>>>>
>>>>> * ad: atomic vs. combined License Categories We want to try with
>>>>> decomposition, i.e. atomic categories as separate facet values (BY,
>>>>> SA, ...)
>>>> Good.
>>>>
>>>>> * ad: indication of license/availability information being
>>>>> unavailable in the dev-instance we use a placeholder "[missing
>>>>> value]" (actually in all facets), but it needs to be decided if we
>>>>> want to expose this in the main vlo, especially given the many
>>>>> records falling into this category. we cannot say "non standard license", because we don't know.
>>>>> we can only say "unspecified" or synonyms thereof, "unspecified"
>>>>> being sometimes used as value itself.
>>>> Unspecified is OK.
>>>>
>>>>> 2) ad: C-* facets It's actually the opposite, these are special
>>>>> facets exposing the values individual concepts that contribute to
>>>>> the actual availability/license facets. (concept-facet mapping) The
>>>>> overview of these concepts, incl. definition (copied from the
>>>>> source) and links to CCR are in the trac-wiki [1]. These C-* facets
>>>>> are exactly meant to be able to identify, where the individual
>>>>> values in the availability facet came from. Identifying the
>>>>> underlying concepts is more-or-less the closest we can get you
>>>>> (easily), as VLO does not keep the information from which actual profile/element given value comes from.
>>>>> (this is also in response to 2nd point of 3) ). However one can get
>>>>> this information in the detail-view (looking into full metadata
>>>>> record). Not super convenient, but well. And as said the
>>>>> ProfileName and DataProvider facets help you identify the provider
>>>>> and profile in question.
>>>> OK.
>>>>
>>>>> ad: licence type vs. license type Yes, indeed there is both a
>>>>> "licence type" and a "license type" concept. they come from ISOcat
>>>>> still
>>>>> (DC-3800 and DC-5439 resp.) I added a snapshot from SMC-browser to
>>>>> the wiki page [2] showing where these two come from (in which
>>>>> profiles they are used and what was the context of these profiles.
>>>>> (also attaching the snapshot) It is 3 and 5 profiles using these, I
>>>>> guess it would be possible in this case to ask the authors (with
>>>>> the help from the CCR and CMDI team) to merge these two and correct
>>>>> the profiles accordingly.
>>>> Good idea.
>>>>
>>>>> Two more points from my side: AFAI understood there is a conflict
>>>>> in the understanding of PUB/ACA/RES in CLARIN and in META-SHARE, in
>>>>> META-SHARE everything beyond CC-0 being of availability:restrictedUse.
>>>>> Is that correct? The example above [1] delivers also the CLARIN
>>>>> compliant licence (CLARIN_ACA-NC), but I doubt that this is the
>>>>> case for all META-SHARE records. So in my understanding we need to
>>>>> disregard the availability information in resourceInfo-profile and
>>>>> just regard the licence and restrictionsOfUse. Would you agree?
>>>> I seem to remember that META-SHARE made a point of declaring everything except CC0 restricted, which may be true from a legal point of view, although I don't think they used this for any particular purpose as all the regular CC licenses then also fall into the META-SHARE restricted category.
>>>>
>>>> In CLARIN, the RES category was intended to be used for resources "restricted to individual use" typically containing personal data preventing them from being opened to a broader category of users. This is often referred to only as "restricted use" due to the RES acronym and therefore misinterpreted in view of the META-SHARE terminology.
>>>>
>>>>> The next question that is not clear to me: - Is NC equivalent with
>>>>> ACA? Because then we have a problem with CC-NC?
>>>> No. NC is not equivalent with ACA.
>>>>
>>>> In its basic form, ACA means "resources available for educational, teaching and research purposes" including commercial research, so we need NC to specify that an ACA resource is available only for non-commercial purposes.
>>>>
>>>> In addition, ACA implies ID i.e. "A user needs to be authenticated
>>>> or identified." and BY as that is required by law in most EU
>>>> countries anyway. (This is why there is CC0 to explicitly say that
>>>> we don't care about attribution.)
>>>>
>>>> Authentication implies more than self-identification for collecting usage statistics, so someone needs to verify the identity. For this we need an affiliation to some community that can authenticate the user.
>>>> We currently offer two flavors of affiliation: EDU and META. If nothing else is mentioned EDU is assumed (which is the pure ACA), but if META is mentioned (by saying ACA+META), we also acknowledged that the META community, which includes industrial partners, may do the authentication. How they do it, is up to them.
>>>>
>>>> In contrast to ACA resources, we may also have resources available for any purpose that still require self-identification for collecting usage statistics, e.g. the ip address may be collected or some email address or whatever means of identification the distributor of the resource chooses. This does not restrict access to the resource to a particular community, so we can therefore put such resources in the category PUB+ID.
>>>>
>>>> In order to be able to control the ID for authentication, ACA also implies NORED. If the resource could be distributed freely to other researchers, automated authentication could not be implemented and would also not make sense.
>>>>
>>>> More generally the following implications hold:
>>>>
>>>>       ACA => ID;BY;NORED
>>>>       ACA;META => ID;BY
>>>>       RES => ID;BY;NORED
>>>>
>>>>> I hope I did not add more confusion.
>>>> I hope my answers clarified some parts.
>>>>
>>>> --
>>>> Krister
>>>>
>>>
>
>


More information about the Tf-curation mailing list