[Tf-curation] Part II: Re: AW: License/Availability was WG: Re: LicenseAvailabilityMap.xml in vlo/trunk/vlo-commons/src/main/resources – CLARIN Trac

Krister Lindén krister.linden at helsinki.fi
Sun Jan 31 22:26:36 CET 2016


Dear all,

This part focuses on some aspects of derivative data and its implications.

Part II:

On 26.1.2016 16:44, Penny Labropoulou wrote:
> - I have
> added a NOTICE tag; I have used this mainly in FOSS licences (e.g.
> Apache, GNU licences) because what they require is to retain in the
> modified versions "the copyright notice" (and BTW, also the policy
> notice, but we don't have any tag for that); I have the feeling that
> this is not exactly the same with "attribution" which asks for
> explicit acknowledgement of the author/creator/owner of the resource,
> even when using the resource. For instance, if I write a paper and
> use a tagger to tag a corpus, I have to attribute the corpus owner
> (if licensed with CC-BY) but I have no such obligation for the use of
> the tagger (if licensed with GPL). If my understanding is incorrect,
> please change them all to BY. Krister, what do you think?

You are correct that the copyright notice only covers the claim of 
ownership of the software but makes no claim on attribution when using 
the software. However, the story does not end there.

Using software on a resource does not normally attach any copyright to 
the resource that is processed, unless this is explicitly mentioned in 
the software license. In contrast, using software as a library with 
functions linked into a new piece of code will usually require that the 
software library is mentioned in the copyright notice of the new code.

The corresponding distinction for data would be mining the data in 
contrast to quoting or annotating it. Data mining ends up in fragments 
of the data or numbers that no longer contain copyright. This will also 
mean that data used for data mining will strictly speaking not need to 
be attributed as no copyright is left in the final output regardless of 
any attribution requirements. However, quoting or annotating data will 
leave visible copyrighted fragments in the new work so that case the 
source needs to be attributed.

 From a practical point of view attribution in data and the copyright 
notice in software are therefore the same thing, so in order not to 
become too bogged down with detailed distinctions in our tag set we can 
safely subsume both under BY, but we may wish to add the above reasoning 
in some document, as it is perhaps not obvious to everyone.

> - I also
> had a lot of thought on the semantics of ShareAlike (SA) when it
> comes to FOSS licences: looking around, it seems there's a confusion
> between "share alike" of the resource itself (i.e. the licence that
> comes with the source code when it is included in a modified version)
> and the "share alike" of the derivatives as a whole (i.e. the source
> code but also the changes I've made to the code); at CLARIN & in CC
> "share alike" refers to the licence of the derivatives; thus, I think
> only GNU licences are really SA, but another check would be
> appreciated.

Share alike ties in with the above reasoning. If there is a copyrighted 
piece of work in the final work, share alike means that the whole thing 
needs to be shared on conditions similar to the original work.

If there is no old material restricted by copyright in the final work 
(e.g. data mining), we do not have a derivative work but a new work. A 
new work has a license that can be determined by the creator of the new 
work and the creator is therefore by definition not bound by any share 
alike conditions.

> - I have used ACA in the broad sense, covering
> "research", "science", "historical or phonological research" etc. And
> this brings to a general comment: we are making interpretations of
> the licences and the rights statements that providers make; there
> should be a warning, thus, to the end-users of this and that these
> mappings are made to facilitate the faceted browsing only, and that
> the rights statements should be thoroughly checked before the user
> actually uses the resource. This is important because we have no way
> of checking whether the provider agrees with this mapping.

When the data is curated into a CLARIN Center, we will usually be able 
to discuss with the depositor. If the deposition is fully automated, we 
are allowed to put the responsibility on the depositor.

However, as an organization, a CLARIN center will still most likely be 
liable for the interpretation of the tags regardless of how much it 
tries to blame the depositor, so even if the procedure is fully 
automated, it would wise to check the data that is deposited and make 
sure that the license is reasonable before starting to distribute it.

If a CLARIN center has no access to the data, but only has a link to 
some external repository distributing the data, the repository will of 
course have final responsibility for the conditions of use. This also 
highlights the fact that the tags are only metadata and not the real 
license conditions, so as a general claim in the terms of service, we 
have that it is the responsibility of the end-user to acquaint himself 
with the original licenses.

> - for some
> cases, I have used the google translate to check the rights statement
> in order to map it; if a native speaker can check (we all know the
> quality of google..) it would be appreciated!

Thanks for the effort. We should of course offer this to the whole CLIC 
community for crowd-sourcing when the time comes.

> - there are cases (that
> I have also marked on the googlesheet) where the rights statement
> should be mapped to two clearly different "licences", e.g. "free
> available for purely academic institutions; fees apply for commercial
> institutions" should be mapped to ACA & RES;FF.

Seems like a possible solution in this case, and this seems to be one of 
only two cases, so it is not worth making a big issue of it.

The other case is: "free for academic use by non-commercial 
organisation; 15.000 euro for commercial use and for use by commercial 
organisation; user licence required " which maps to ACA;NC & RES;FF

More generally, however, having two different sets of license categories 
in one record could confuse which tags go together. An easy solution is 
of course to make two metadata records at the source. It is still 
possible to refer to the original data with one PID.

> Matej, thanks for the explanations! And to your questions: - for the
> META-SHARE "availability" element, yes you are correct: basically
> only CC-0 and the like are "unrestricted"; honestly, given what I
> have seen in the way it has been filled in, it's safer to totally
> disregard it, as you suggest. We are currently going through an
> update of META-SHARE and hopefully correct some errors we have
> spotted in the metadata together with the providers, but, anyway,
> "licence" & "restrictionsOfUse" give you more feedback.

OK. Sounds good.

> - and I think
> it's best to keep NC as not equivalent to ACA; so, CC-NC is PUB;NC,
> while "free for academic use" will be mapped to ACA.

Agree.

--
Krister


More information about the Tf-curation mailing list