[Tf-curation] Part II: Re: AW: License/Availability was WG: Re: LicenseAvailabilityMap.xml in vlo/trunk/vlo-commons/src/main/resources – CLARIN Trac
Krister Lindén
krister.linden at helsinki.fi
Sun Jan 31 22:26:36 CET 2016
Dear all,
This part focuses on some aspects of derivative data and its implications.
Part II:
On 26.1.2016 16:44, Penny Labropoulou wrote:
> - I have
> added a NOTICE tag; I have used this mainly in FOSS licences (e.g.
> Apache, GNU licences) because what they require is to retain in the
> modified versions "the copyright notice" (and BTW, also the policy
> notice, but we don't have any tag for that); I have the feeling that
> this is not exactly the same with "attribution" which asks for
> explicit acknowledgement of the author/creator/owner of the resource,
> even when using the resource. For instance, if I write a paper and
> use a tagger to tag a corpus, I have to attribute the corpus owner
> (if licensed with CC-BY) but I have no such obligation for the use of
> the tagger (if licensed with GPL). If my understanding is incorrect,
> please change them all to BY. Krister, what do you think?
You are correct that the copyright notice only covers the claim of
ownership of the software but makes no claim on attribution when using
the software. However, the story does not end there.
Using software on a resource does not normally attach any copyright to
the resource that is processed, unless this is explicitly mentioned in
the software license. In contrast, using software as a library with
functions linked into a new piece of code will usually require that the
software library is mentioned in the copyright notice of the new code.
The corresponding distinction for data would be mining the data in
contrast to quoting or annotating it. Data mining ends up in fragments
of the data or numbers that no longer contain copyright. This will also
mean that data used for data mining will strictly speaking not need to
be attributed as no copyright is left in the final output regardless of
any attribution requirements. However, quoting or annotating data will
leave visible copyrighted fragments in the new work so that case the
source needs to be attributed.
From a practical point of view attribution in data and the copyright
notice in software are therefore the same thing, so in order not to
become too bogged down with detailed distinctions in our tag set we can
safely subsume both under BY, but we may wish to add the above reasoning
in some document, as it is perhaps not obvious to everyone.
> - I also
> had a lot of thought on the semantics of ShareAlike (SA) when it
> comes to FOSS licences: looking around, it seems there's a confusion
> between "share alike" of the resource itself (i.e. the licence that
> comes with the source code when it is included in a modified version)
> and the "share alike" of the derivatives as a whole (i.e. the source
> code but also the changes I've made to the code); at CLARIN & in CC
> "share alike" refers to the licence of the derivatives; thus, I think
> only GNU licences are really SA, but another check would be
> appreciated.
Share alike ties in with the above reasoning. If there is a copyrighted
piece of work in the final work, share alike means that the whole thing
needs to be shared on conditions similar to the original work.
If there is no old material restricted by copyright in the final work
(e.g. data mining), we do not have a derivative work but a new work. A
new work has a license that can be determined by the creator of the new
work and the creator is therefore by definition not bound by any share
alike conditions.
> - I have used ACA in the broad sense, covering
> "research", "science", "historical or phonological research" etc. And
> this brings to a general comment: we are making interpretations of
> the licences and the rights statements that providers make; there
> should be a warning, thus, to the end-users of this and that these
> mappings are made to facilitate the faceted browsing only, and that
> the rights statements should be thoroughly checked before the user
> actually uses the resource. This is important because we have no way
> of checking whether the provider agrees with this mapping.
When the data is curated into a CLARIN Center, we will usually be able
to discuss with the depositor. If the deposition is fully automated, we
are allowed to put the responsibility on the depositor.
However, as an organization, a CLARIN center will still most likely be
liable for the interpretation of the tags regardless of how much it
tries to blame the depositor, so even if the procedure is fully
automated, it would wise to check the data that is deposited and make
sure that the license is reasonable before starting to distribute it.
If a CLARIN center has no access to the data, but only has a link to
some external repository distributing the data, the repository will of
course have final responsibility for the conditions of use. This also
highlights the fact that the tags are only metadata and not the real
license conditions, so as a general claim in the terms of service, we
have that it is the responsibility of the end-user to acquaint himself
with the original licenses.
> - for some
> cases, I have used the google translate to check the rights statement
> in order to map it; if a native speaker can check (we all know the
> quality of google..) it would be appreciated!
Thanks for the effort. We should of course offer this to the whole CLIC
community for crowd-sourcing when the time comes.
> - there are cases (that
> I have also marked on the googlesheet) where the rights statement
> should be mapped to two clearly different "licences", e.g. "free
> available for purely academic institutions; fees apply for commercial
> institutions" should be mapped to ACA & RES;FF.
Seems like a possible solution in this case, and this seems to be one of
only two cases, so it is not worth making a big issue of it.
The other case is: "free for academic use by non-commercial
organisation; 15.000 euro for commercial use and for use by commercial
organisation; user licence required " which maps to ACA;NC & RES;FF
More generally, however, having two different sets of license categories
in one record could confuse which tags go together. An easy solution is
of course to make two metadata records at the source. It is still
possible to refer to the original data with one PID.
> Matej, thanks for the explanations! And to your questions: - for the
> META-SHARE "availability" element, yes you are correct: basically
> only CC-0 and the like are "unrestricted"; honestly, given what I
> have seen in the way it has been filled in, it's safer to totally
> disregard it, as you suggest. We are currently going through an
> update of META-SHARE and hopefully correct some errors we have
> spotted in the metadata together with the providers, but, anyway,
> "licence" & "restrictionsOfUse" give you more feedback.
OK. Sounds good.
> - and I think
> it's best to keep NC as not equivalent to ACA; so, CC-NC is PUB;NC,
> while "free for academic use" will be mapped to ACA.
Agree.
--
Krister
More information about the Tf-curation
mailing list