[Tf-curation] Part I: Re: AW: AW: License/Availability was WG: Re: LicenseAvailabilityMap.xml in vlo/trunk/vlo-commons/src/main/resources – CLARIN Trac

Krister Lindén krister.linden at helsinki.fi
Sun Jan 31 22:26:07 CET 2016


Dear all,

It seems I could have responded in one long email, but I chose to answer 
in three different parts and therefore I end up confirming some of the 
things here that Penny said in the next message, while also adding some 
explanations, but here goes.

Part I:

On 21.1.2016 17:04, Durco, Matej wrote:
> ACA vs. NC
> as you rightly commented in the gsheet
>
> 1)
>
> * ad: PUB/ACA/RES yes, the goal is to have one of this 3 categories
> assigned to each record/resource. ... the way it
> is solved in META-SHARE profiles brought me to the idea to decompose
> to license categories. Namely in META-SHARE profile the licenceInfo
> is quite rich: There is a licence-element and the repeated
> restrictionsOfUse element, so each record has more than enough
> information to correctly map to both the main categories and the
> optional ones. (see example Helsinki corpus [1]) Therefore I believe
> we can (and have to) be conservative in the mapping and can avoid
> adding uncertain information: Correct me if I am wrong but
> "non-commercial use" can be safely mapped to the License category
> "NC",

Yes it can. There is currently a legal debate going on about what 
non-commercial means exactly, because it is a fuzzy concept, but 
whenever that discussion arrives at a conclusion, which may be as soon 
as the new directive for some kind of research exception emerges, that 
is the definition we will adopt.

> however mapping it to the main category PUB ACA or RES is
> problematic without more information. Long story short, even though
> we cannot map each individual value in the normalization map to one
> of the main categories, in the end each every record/resource (that
> provides the appropriate information) will be assigned to one of the
> three.

Agree.

> * ad: atomic vs. combined License Categories We want to try with
> decomposition, i.e. atomic categories as separate facet values (BY,
> SA, ...)

Good.

> * ad: indication of license/availability information being
> unavailable in the dev-instance we use a placeholder "[missing
> value]" (actually in all facets), but it needs to be decided if we
> want to expose this in the main vlo, especially given the many
> records falling into this category. we cannot say "non standard
> license", because we don't know. we can only say "unspecified" or
> synonyms thereof, "unspecified" being sometimes used as value
> itself.

Unspecified is OK.

> 2) ad: C-* facets It's actually the opposite, these are special
> facets exposing the values individual concepts that contribute to the
> actual availability/license facets. (concept-facet mapping) The
> overview of these concepts, incl. definition (copied from the source)
> and links to CCR are in the trac-wiki [1]. These C-* facets are
> exactly meant to be able to identify, where the individual values in
> the availability facet came from. Identifying the underlying concepts
> is more-or-less the closest we can get you (easily), as VLO does not
> keep the information from which actual profile/element given value
> comes from. (this is also in response to 2nd point of 3) ). However
> one can get this information in the detail-view (looking into full
> metadata record). Not super convenient, but well. And as said the
> ProfileName and DataProvider facets help you identify the provider
> and profile in question.

OK.

> ad: licence type vs. license type Yes, indeed there is both a
> "licence type" and a "license type" concept. they come from ISOcat
> still (DC-3800 and DC-5439 resp.) I added a snapshot from SMC-browser
> to the wiki page [2] showing where these two come from (in which
> profiles they are used and what was the context of these profiles.
> (also attaching the snapshot) It is 3 and 5 profiles using these, I
> guess it would be possible in this case to ask the authors (with the
> help from the CCR and CMDI team) to merge these two and correct the
> profiles accordingly.

Good idea.

> Two more points from my side: AFAI understood there is a conflict in
> the understanding of PUB/ACA/RES in CLARIN and in META-SHARE, in
> META-SHARE everything beyond CC-0 being of
> availability:restrictedUse. Is that correct? The example above [1]
> delivers also the CLARIN compliant licence (CLARIN_ACA-NC), but I
> doubt that this is the case for all META-SHARE records. So in my
> understanding we need to disregard the availability information in
> resourceInfo-profile and just regard the licence and
> restrictionsOfUse. Would you agree?

I seem to remember that META-SHARE made a point of declaring everything 
except CC0 restricted, which may be true from a legal point of view, 
although I don't think they used this for any particular purpose as all 
the regular CC licenses then also fall into the META-SHARE restricted 
category.

In CLARIN, the RES category was intended to be used for resources 
"restricted to individual use" typically containing personal data 
preventing them from being opened to a broader category of users. This 
is often referred to only as "restricted use" due to the RES acronym and 
therefore misinterpreted in view of the META-SHARE terminology.

> The next question that is not clear to me: - Is NC equivalent with
> ACA? Because then we have a problem with CC-NC?

No. NC is not equivalent with ACA.

In its basic form, ACA means "resources available for educational, 
teaching and research purposes" including commercial research, so we 
need NC to specify that an ACA resource is available only for 
non-commercial purposes.

In addition, ACA implies ID i.e. "A user needs to be authenticated or 
identified." and BY as that is required by law in most EU countries 
anyway. (This is why there is CC0 to explicitly say that we don't care 
about attribution.)

Authentication implies more than self-identification for collecting 
usage statistics, so someone needs to verify the identity. For this we 
need an affiliation to some community that can authenticate the user. 
We currently offer two flavors of affiliation: EDU and META. If nothing 
else is mentioned EDU is assumed (which is the pure ACA), but if META is 
mentioned (by saying ACA+META), we also acknowledged that the META 
community, which includes industrial partners, may do the 
authentication. How they do it, is up to them.

In contrast to ACA resources, we may also have resources available for 
any purpose that still require self-identification for collecting usage 
statistics, e.g. the ip address may be collected or some email address 
or whatever means of identification the distributor of the resource 
chooses. This does not restrict access to the resource to a particular 
community, so we can therefore put such resources in the category PUB+ID.

In order to be able to control the ID for authentication, ACA also 
implies NORED. If the resource could be distributed freely to other 
researchers, automated authentication could not be implemented and would 
also not make sense.

More generally the following implications hold:

  ACA => ID;BY;NORED
  ACA;META => ID;BY
  RES => ID;BY;NORED

> I hope I did not add more confusion.

I hope my answers clarified some parts.

--
Krister


More information about the Tf-curation mailing list