[Tf-curation] l/a synthesis

Krister Lindén krister.linden at helsinki.fi
Wed Feb 10 15:49:13 CET 2016


Dear Ondřej,

Currently I understand that we are interpreting the license information 
into CMDI components, but if the CLARIN Centers already provide CMDI 
components for legal meta data, it was also my understanding that this 
would not be overridden, unless there is serious misuse. This will allow 
the Centers to fill in or correct missing information in their metadata, 
which they may be willing to do when they see that it actually has an 
effect. If they do not provide explicit legal metadata components, they 
get the automatic mapping provided by CLARIN ERIC.

One simple quality check on legal metadata provided as CMDI components 
by the centers is to look at the names of the licenses and see if a 
license name (modulo some spelling variations) always comes with the 
same set of labels. If there are conflicts, then either the license name 
is wrong or the labeling needs to be corrected. If there is disagreement 
on how to resolve the issue, CLIC may be of assistance.

Staff at repositories should have access to actual licenses and can use 
the license calculator 
https://www.clarin.eu/content/clarin-license-category-calculator to help 
them label their licenses in a consistent way. The license tags provided 
by the calculator have been quite extensively tested over the years, but 
life is full of surprises.

Regards,
Krister

On 10.2.2016 15:29, Ondřej Košarko wrote:
> Hello everyone,
>
> I am getting slightly concerned with the various mentions that the
> license labeling should be coming from the producers (centers)
> themselves. Is that really so or am I misreading the discussion?
>
> As you've been writing all along, the labels are just
> interpretation/summary of the actual conditions, so what happens if two
> centers choose to label the same license differently? Wouldn't that be
> confusing in the results? Or are the incoming labels meant to be used
> only for "not well known licenses"?
>
> Regards,
> Ondrej
>
> 2016-02-05 16:09 GMT+01:00 Krister Lindén <krister.linden at helsinki.fi
> <mailto:krister.linden at helsinki.fi>>:
>
>     Since we are interpreting what others have written, we need to take
>     a conservative view not to give the impression that the data is more
>     freely available than it is. We are currently in the process of
>     interpreting the usage conditions in the metadata by going from
>     "unspecified" to something slightly more informative. We therefore
>     interpret the resources to be as freely accessible as we safely can.
>     If the metadata provider is unhappy with how narrowly his legal
>     metadata is interpreted in the VLO, he can set the elements more
>     exactly in his CMDI metadata.
>
>     (Non-CLARIN Centers have the same opportunity. For a CLARIN Center,
>     CLARIN ERIC has added responsibility for auditing the quality of the
>     relationship between the licenses and the CMDI metadata.)
>
>     Penny's advice to be careful is good but it actually works in the
>     other direction based on the principle that one cannot give more
>     rights than one has. Without additional info, "for research" can be
>     narrowly approximated by at least +LRT without saying whether the
>     data is available for all (PUB), for a trusted community (ACA) or
>     upon personal request to the owner (RES). However, using only +LRT
>     would likely keep the resource in the "unspecified" main category,
>     whereas using ACA is a rather safe bet stating that the data is
>     available for research while also assuming that the downloader needs
>     to be identified to access the data, which in most cases is
>     unfortunately still true for research data that is not explicitly
>     licensed with one of the open or public licenses.
>
>     If we have additional knowledge that +ID is not required by the
>     license of the resource, then Penny's suggestion for PUB+research
>     can be narrowly approximated with tags saying that at least PUB+LRT
>     is safe to assume.
>
>     Regards,
>     Krister
>
>
>     On 5.2.2016 11:19, Penny Labropoulou wrote:
>
>         Dear Matej and Krister,
>         If indeed the VLO harvests only from CLARIN centres, and
>         following Krister's explanations, ok, let's have the tags explicit.
>         But I have the feeling that the VLO also harvests from other
>         sources, and these may not all include CMDI metadata or, even
>         more, a licensing category or even a licence (which should be
>         imposed at least for new data!); if this is the case, then we
>         are actually interpreting the providers' metadata, often just
>         free text statements, in which case we should be more careful, I
>         think. If the providers simply state "for research" and we
>         interpret that as ACA (as done in our excel), then the ID tag
>         may be more than what the original providers ask for; if asked,
>         they might have gone for PUB +research. In any case, the users
>         are directed through the VLO to where the resource itself is
>         made available, and there, the users will have to accept
>         whatever licensing conditions the provider asks for and it's up
>         to the source distributor to enforce it. If the source
>         distributor is a CLARIN centre, the ID will be imposed by our
>         own policy, and it's clearly stated, as Krister says, in the agree!
>
>     ment tem
>
>     plate.
>
>         I would like to see the ID being used as a way of facilitating
>         access to resources for researchers in a trusted federation such
>         as CLARIN, rather than a way of discouraging access to resources.
>         As said, if the VLO harvests only from CLARIN centres, just
>         disregard all the above.
>         Best,
>         Penny
>
>         -----Original Message-----
>         From: Krister Lindén [mailto:krister.linden at helsinki.fi
>         <mailto:krister.linden at helsinki.fi>]
>         Sent: Friday, February 05, 2016 3:55 AM
>         To: Durco, Matej <Matej.Durco at oeaw.ac.at
>         <mailto:Matej.Durco at oeaw.ac.at>>; Penny Labropoulou
>         <penny at ilsp.gr <mailto:penny at ilsp.gr>>; 'Twan Goosen'
>         <twan.goosen at mpi.nl <mailto:twan.goosen at mpi.nl>>; 'Sander
>         Maijers' <sander at clarin.eu <mailto:sander at clarin.eu>>
>         Cc: 'Menzo Windhouwer2' <menzo.windhouwer at meertens.knaw.nl
>         <mailto:menzo.windhouwer at meertens.knaw.nl>>; 'Thomas Eckart'
>         <teckart at informatik.uni-leipzig.de
>         <mailto:teckart at informatik.uni-leipzig.de>>; Ostojic, Davor
>         <Davor.Ostojic at oeaw.ac.at <mailto:Davor.Ostojic at oeaw.ac.at>>;
>         tf-curation at lists.clarin.eu
>         <mailto:tf-curation at lists.clarin.eu>; Sugimoto, Go
>         <Go.Sugimoto at oeaw.ac.at <mailto:Go.Sugimoto at oeaw.ac.at>>;
>         'Dieter Van Uytvanck' <dieter at clarin.eu <mailto:dieter at clarin.eu>>
>         Subject: Re: l/a synthesis
>
>         Dear Matej,
>
>         Regarding the explicitness of tags: In the current Agreement
>         templates for a normal ACA resource, the federated login,
>         attribution and no redistribution conditions are made explicit.
>         It would therefore be better to reflect this in the tags in the
>         VLO for this automated update.
>
>         Not having the tags explicitly may cause liability for CLARIN in
>         some cases as it encourages unintended usage, whereas being
>         slightly too strict in the labeling will have no legal
>         implications. People will only be pleasantly surprised that some
>         resources are more widely useable than they imagined.
>
>         When CLARIN Centers provide their own licenses with tags already
>         marked as CMDI components, they will be responsible for the
>         labeling of their own licenses. If they do not e.g. require
>         login for their particular brand of licenses "for (teaching,
>         education and) research-purpose", they may leave out the ID tag,
>         but if the ID tag is only a non-explicit assumption via the
>         guidelines, the Centers will not even be able to leave it out,
>         as it should always be implicitly read into the tag set.
>         (Note that the assessment of the license labeling should be part
>         of the regular CLARIN Center assessment procedure.)
>
>         The "other" tag is there to draw the attention of the user to
>         peculiar but relevant usage conditions similar to "only to be
>         used on Tuesdays"
>         or the like. We can't have a tag for everything, but an asterisk
>         is an indication that this license has conditions out of the
>         ordinary. We are aware that recognizing what is out of the
>         ordinary may be non-trivial.
>
>         Regards,
>         Krister
>
>
>         On 3.2.2016 18:28, Durco, Matej wrote:
>
>             Dear Krister,
>
>             thank you a lot for the extensive response, this is really
>             very helpful!
>
>             In my view, your clarifications regarding NC/ACA and
>             derivative data should definitely find their way into the
>             public information about L/A [1].
>
>             The hint that the three main categories imply certain
>             subcategories (ACA => ID;BY;NORED) is also very helpful.
>             I just wonder, if we want to make it *explicit* in the VLO
>             (i.e. add for every resource with ACA tag, also the ID, BY
>             and NORED attributes), or just explain it (somewhere under
>             [1]) and leave that implicit in VLO.
>
>             In the list you proposed to map a few "restricted ..."
>             values with "*"
>             (or other), which seems a bit counterintuitive, but I guess
>             this has to do with the special meaning of "RES"... ?
>
>             The next steps:
>             We right now process the (Krister's) mapping into a
>             normalization map as used by VLO.
>             We will apply it on our Minerva VLO instance first and let
>             you inspect the new mappings probably  on Monday.
>             We will also tentatively try to map from the dc-concepts
>             (dcterms:rights, dcterms:accessRights, dcterms:license), to
>             see if we can get a better coverage (the profile coverage
>             analysis [2] suggests so) After a few days validation and
>             comment period, we would apply the mapping in the main VLO
>             instance (and roll out with version 3.4).
>
>             There is one more thing, we would like to have feedback on,
>             especially from CLIC. That is the labels and definitions for
>             the l/a related facets.
>             But I spare that for a separate email.
>
>             Thank you for all the input so far.
>
>             Best,
>             Matej
>
>             [1] https://www.clarin.eu/content/license-categories
>             [2]
>             https://docs.google.com/spreadsheets/d/1eeOr0ShOWxdY8BLzp62LDyfGgHo0gZ
>             95Myw0qauzLxU/edit#gid=0&vpid=A1
>
>
>             -----Ursprüngliche Nachricht-----
>             Von: Krister Lindén [mailto:krister.linden at helsinki.fi
>             <mailto:krister.linden at helsinki.fi>]
>             Gesendet: Sonntag, 31. Jänner 2016 22:26
>             An: Durco, Matej <Matej.Durco at oeaw.ac.at
>             <mailto:Matej.Durco at oeaw.ac.at>>; Penny Labropoulou
>             <penny at ilsp.gr <mailto:penny at ilsp.gr>>; 'Twan Goosen'
>             <twan.goosen at mpi.nl <mailto:twan.goosen at mpi.nl>>; 'Sander
>             Maijers'
>             <sander at clarin.eu <mailto:sander at clarin.eu>>
>             Cc: 'Menzo Windhouwer2' <menzo.windhouwer at meertens.knaw.nl
>             <mailto:menzo.windhouwer at meertens.knaw.nl>>; 'Thomas
>             Eckart' <teckart at informatik.uni-leipzig.de
>             <mailto:teckart at informatik.uni-leipzig.de>>; Ostojic, Davor
>             <Davor.Ostojic at oeaw.ac.at
>             <mailto:Davor.Ostojic at oeaw.ac.at>>;
>             tf-curation at lists.clarin.eu
>             <mailto:tf-curation at lists.clarin.eu>; Sugimoto, Go
>             <Go.Sugimoto at oeaw.ac.at <mailto:Go.Sugimoto at oeaw.ac.at>>;
>             'Dieter Van Uytvanck' <dieter at clarin.eu
>             <mailto:dieter at clarin.eu>>
>             Betreff: Part I: Re: AW: AW: [Tf-curation]
>             License/Availability was
>             WG: Re: LicenseAvailabilityMap.xml in
>             vlo/trunk/vlo-commons/src/main/resources – CLARIN Trac
>
>             Dear all,
>
>             It seems I could have responded in one long email, but I
>             chose to answer in three different parts and therefore I end
>             up confirming some of the things here that Penny said in the
>             next message, while also adding some explanations, but here
>             goes.
>
>             Part I:
>
>             On 21.1.2016 17:04, Durco, Matej wrote:
>
>                 ACA vs. NC
>                 as you rightly commented in the gsheet
>
>                 1)
>
>                 * ad: PUB/ACA/RES yes, the goal is to have one of this 3
>                 categories
>                 assigned to each record/resource. ... the way it is
>                 solved in
>                 META-SHARE profiles brought me to the idea to decompose
>                 to license
>                 categories. Namely in META-SHARE profile the licenceInfo
>                 is quite
>                 rich: There is a licence-element and the repeated
>                 restrictionsOfUse
>                 element, so each record has more than enough information
>                 to correctly
>                 map to both the main categories and the optional ones.
>                 (see example
>                 Helsinki corpus [1]) Therefore I believe we can (and
>                 have to) be
>                 conservative in the mapping and can avoid adding uncertain
>                 information: Correct me if I am wrong but
>                 "non-commercial use" can be
>                 safely mapped to the License category "NC",
>
>
>             Yes it can. There is currently a legal debate going on about
>             what non-commercial means exactly, because it is a fuzzy
>             concept, but whenever that discussion arrives at a
>             conclusion, which may be as soon as the new directive for
>             some kind of research exception emerges, that is the
>             definition we will adopt.
>
>                 however mapping it to the main category PUB ACA or RES
>                 is problematic
>                 without more information. Long story short, even though
>                 we cannot map
>                 each individual value in the normalization map to one of
>                 the main
>                 categories, in the end each every record/resource (that
>                 provides the
>                 appropriate information) will be assigned to one of the
>                 three.
>
>
>             Agree.
>
>                 * ad: atomic vs. combined License Categories We want to
>                 try with
>                 decomposition, i.e. atomic categories as separate facet
>                 values (BY,
>                 SA, ...)
>
>
>             Good.
>
>                 * ad: indication of license/availability information being
>                 unavailable in the dev-instance we use a placeholder
>                 "[missing
>                 value]" (actually in all facets), but it needs to be
>                 decided if we
>                 want to expose this in the main vlo, especially given
>                 the many
>                 records falling into this category. we cannot say "non
>                 standard license", because we don't know.
>                 we can only say "unspecified" or synonyms thereof,
>                 "unspecified"
>                 being sometimes used as value itself.
>
>
>             Unspecified is OK.
>
>                 2) ad: C-* facets It's actually the opposite, these are
>                 special
>                 facets exposing the values individual concepts that
>                 contribute to the
>                 actual availability/license facets. (concept-facet
>                 mapping) The
>                 overview of these concepts, incl. definition (copied
>                 from the source)
>                 and links to CCR are in the trac-wiki [1]. These C-*
>                 facets are
>                 exactly meant to be able to identify, where the
>                 individual values in
>                 the availability facet came from. Identifying the
>                 underlying concepts
>                 is more-or-less the closest we can get you (easily), as
>                 VLO does not
>                 keep the information from which actual profile/element
>                 given value comes from.
>                 (this is also in response to 2nd point of 3) ). However
>                 one can get
>                 this information in the detail-view (looking into full
>                 metadata
>                 record). Not super convenient, but well. And as said the
>                 ProfileName
>                 and DataProvider facets help you identify the provider
>                 and profile in
>                 question.
>
>
>             OK.
>
>                 ad: licence type vs. license type Yes, indeed there is
>                 both a
>                 "licence type" and a "license type" concept. they come
>                 from ISOcat
>                 still
>                 (DC-3800 and DC-5439 resp.) I added a snapshot from
>                 SMC-browser to
>                 the wiki page [2] showing where these two come from (in
>                 which
>                 profiles they are used and what was the context of these
>                 profiles.
>                 (also attaching the snapshot) It is 3 and 5 profiles
>                 using these, I
>                 guess it would be possible in this case to ask the
>                 authors (with the
>                 help from the CCR and CMDI team) to merge these two and
>                 correct the
>                 profiles accordingly.
>
>
>             Good idea.
>
>                 Two more points from my side: AFAI understood there is a
>                 conflict in
>                 the understanding of PUB/ACA/RES in CLARIN and in
>                 META-SHARE, in
>                 META-SHARE everything beyond CC-0 being of
>                 availability:restrictedUse.
>                 Is that correct? The example above [1] delivers also the
>                 CLARIN
>                 compliant licence (CLARIN_ACA-NC), but I doubt that this
>                 is the case
>                 for all META-SHARE records. So in my understanding we
>                 need to
>                 disregard the availability information in
>                 resourceInfo-profile and
>                 just regard the licence and restrictionsOfUse. Would you
>                 agree?
>
>
>             I seem to remember that META-SHARE made a point of declaring
>             everything except CC0 restricted, which may be true from a
>             legal point of view, although I don't think they used this
>             for any particular purpose as all the regular CC licenses
>             then also fall into the META-SHARE restricted category.
>
>             In CLARIN, the RES category was intended to be used for
>             resources "restricted to individual use" typically
>             containing personal data preventing them from being opened
>             to a broader category of users. This is often referred to
>             only as "restricted use" due to the RES acronym and
>             therefore misinterpreted in view of the META-SHARE terminology.
>
>                 The next question that is not clear to me: - Is NC
>                 equivalent with
>                 ACA? Because then we have a problem with CC-NC?
>
>
>             No. NC is not equivalent with ACA.
>
>             In its basic form, ACA means "resources available for
>             educational, teaching and research purposes" including
>             commercial research, so we need NC to specify that an ACA
>             resource is available only for non-commercial purposes.
>
>             In addition, ACA implies ID i.e. "A user needs to be
>             authenticated or
>             identified." and BY as that is required by law in most EU
>             countries
>             anyway. (This is why there is CC0 to explicitly say that we
>             don't care
>             about attribution.)
>
>             Authentication implies more than self-identification for
>             collecting usage statistics, so someone needs to verify the
>             identity. For this we need an affiliation to some community
>             that can authenticate the user.
>             We currently offer two flavors of affiliation: EDU and META.
>             If nothing else is mentioned EDU is assumed (which is the
>             pure ACA), but if META is mentioned (by saying ACA+META), we
>             also acknowledged that the META community, which includes
>             industrial partners, may do the authentication. How they do
>             it, is up to them.
>
>             In contrast to ACA resources, we may also have resources
>             available for any purpose that still require
>             self-identification for collecting usage statistics, e.g.
>             the ip address may be collected or some email address or
>             whatever means of identification the distributor of the
>             resource chooses. This does not restrict access to the
>             resource to a particular community, so we can therefore put
>             such resources in the category PUB+ID.
>
>             In order to be able to control the ID for authentication,
>             ACA also implies NORED. If the resource could be distributed
>             freely to other researchers, automated authentication could
>             not be implemented and would also not make sense.
>
>             More generally the following implications hold:
>
>                  ACA => ID;BY;NORED
>                  ACA;META => ID;BY
>                  RES => ID;BY;NORED
>
>                 I hope I did not add more confusion.
>
>
>             I hope my answers clarified some parts.
>
>             --
>             Krister
>
>
>
>     _______________________________________________
>     Tf-curation mailing list
>     Tf-curation at lists.clarin.eu <mailto:Tf-curation at lists.clarin.eu>
>     https://lists.clarin.eu/cgi-bin/mailman/listinfo/tf-curation
>
>


More information about the Tf-curation mailing list