[Tf-curation] l/a synthesis - unavailable
Penny Labropoulou
penny at ilsp.gr
Mon Feb 22 11:26:31 CET 2016
Indeed a very interesting thread!
I would also like to learn more about the PID for virtual collections and search results.
But for the licence point, a couple of points:
- in the update of META-SHARE (soon to be released), we have re-ordered the elements so that the licenceInfo component can be used for only one licence; thus, all the tags inside this component will be linked to one licence only - and we will check their consistency, as I said in my previous message, together with the metadata providers;
- resources can be linked to one or more distributionInfo components, which allows for the multiple licensing - still not a perfect representation, but we 're resolving some issues; so, in the case of the ELFA instance, one way to describe it is as 2 distributions of the same corpus, each associated with a different licence; if I get it correctly, you could say that the corpus as a whole in the downloadable format is licensed with a RES licence, while the output of the concordancer is licensed with a PUB licence; not perfectly described through the metadata elements that we have but it's one way of seeing this. An alternative, of course, is to tackle it as you suggest, as two different corpora, most probably with some link between them.
The interesting thing is that in this scenario, we have various forms of the same resource:
- the resource as a whole distributed under some licence,
- the queriable form of the resource, again distributed under some licence, but, as far as its contents, exactly the same as the above
- the output of specific queries, the contents of which, of course, differ depending on the query, and which are licensed with the second licence.
Should all of these receive a different PID?
p
-----Original Message-----
From: Krister Lindén [mailto:krister.linden at helsinki.fi]
Sent: Monday, February 22, 2016 11:54 AM
To: Penny Labropoulou <penny at ilsp.gr>; 'Durco, Matej' <Matej.Durco at oeaw.ac.at>
Cc: 'Menzo Windhouwer2' <menzo.windhouwer at meertens.knaw.nl>; 'Thomas Eckart' <teckart at informatik.uni-leipzig.de>; 'Ostojic, Davor' <Davor.Ostojic at oeaw.ac.at>; 'Dieter Van Uytvanck' <dieter at clarin.eu>; tf-curation at lists.clarin.eu; 'Twan Goosen' <twan.goosen at mpi.nl>; 'Sander Maijers' <sander at clarin.eu>
Subject: Re: AW: AW: l/a synthesis - unavailable
Thanks Penny for pointing out the inconsistent labeling.
Matej, could CLARIN ERIC provide a list of inconsistently labeled records to metadata providers? We can begin the cleaning process with the CLARIN Centers you harvest from and worry about non-CLARIN Centers later. I guess this is a good starting point for a joint effort with CLIC on making sure that there are a) licenses for resources, b) legal metadata for new licenses and c) consistent metadata for existing licenses.
As it happens, FIN-CLARIN seems the culprit in this ELFA instance with the problematic record belonging to us. I have tried to discourage the use of totally different licenses in the same metadata record for the reason that the legal metadata records are unstructured lists and in the end we do not know which subcategory goes with which license. The solution is to have different records if the licenses allow totally different uses of the same corpus.
A more prototypical case are newspaper archives that may be available to individual researchers as RES. However, concordance excerpts can be provided with a concordance tool to everyone for redistribution as PUB.
Labeling the original corpus as PUB is misleading as the original corpus cannot be redistributed, while labeling the corpus as RES is also misleading as anyone can make searches in the concordance service. We solve this by having two licenses: 1. the original resource is RES, 2:
the derivative resource is PUB. The derivative procedure is implemented by the concordance tool scrambling sentences and providing max. 1000 records at a time. Although there is one underlying resource with two uses, having all of this in one legal metadata record would be confusing. It is more convenient to say that we have two different
corpora: a real corpus with individual access and a virtual corpus with public access. (In some cases, the original corpus is even unavailable to researchers while the virtual corpus is still publicly accessible.)
This solves the metadata problem but now we have a technical problem:
what is the PID of the virtual corpus? Arguably, we could create one PID for each set of search conditions as they will produce the same virtual corpus view of the underlying real corpus each time. Is there a better solution? What do your PIDs for virtual corpus collections and federated search results point to?
Regards,
Krister
On 22.2.2016 09:11, Penny Labropoulou wrote:
> Hi all!
> Just to say that I'm on the same level with Krister about keeping the unavailable resources in the unspec category, for the very good reasons he brings forward.
>
> Another comment, though; looking at http://beta-vlo.clarin.eu/record?4&q=availability:FF&docId=http_58__47__47_urn.fi_47_urn_58_nbn_58_fi_58_lb-201403262, I have noticed a problem that comes from the providers' metadata and that's the erroneous use of the licensing tags: the licence is a META-SHARE Commercial one, which allows use both for commercial and non-commercial uses, but the provider has selected the "academic-nonCommercialUse" value, which ends up with a record classified both as RES & ACA. We are currently working on an update of META-SHARE, in which we are also looking at inconsistent use of tags, so, hopefully soon enough these will be corrected.
> Nevertheless, I think there should also be a checking mechanism in the mapping procedure to prevent double classification of a resource, or at least to notify the user of such problems. Is this possible? Note that there will be cases where a resource can be classified with two or more categories, but this should be due to different licences, e.g. a resource made available for research purposes (ACA) for free and the same resource licensed with a FF licence for commercial purposes (RES). I couldn't find the way to make such a query and see how this is represented.
>
> And last but not least, I would like to congratulate the VLO team for the work they are doing!
>
> Best,
> Penny
> -----Original Message-----
> From: Krister Lindén [mailto:krister.linden at helsinki.fi]
> Sent: Sunday, February 21, 2016 8:29 PM
> To: Durco, Matej <Matej.Durco at oeaw.ac.at>; Penny Labropoulou
> <penny at ilsp.gr>
> Cc: 'Menzo Windhouwer2' <menzo.windhouwer at meertens.knaw.nl>; 'Thomas
> Eckart' <teckart at informatik.uni-leipzig.de>; Ostojic, Davor
> <Davor.Ostojic at oeaw.ac.at>; 'Dieter Van Uytvanck' <dieter at clarin.eu>;
> tf-curation at lists.clarin.eu; Twan Goosen <twan.goosen at mpi.nl>; 'Sander
> Maijers' <sander at clarin.eu>
> Subject: Re: AW: AW: l/a synthesis - unavailable
>
> Matej,
>
> RES indicates that the resource is somehow available to an individual if the person fulfills the declared conditions. There is no specific limit on how cumbersome the conditions can be for the resource still to be classified as RES. However, if you have to be prepared to negotiate the license and clear the rights to get access, a resource is normally classified as unavailable, because you can always negotiate.
>
> In the end, I guess it comes down to how much effort we wish to spend on distinguishing not yet negotiated licenses from licenses that have not been properly labeled, but for now I would be inclined to keep the unavailable resources in the Unspec category as well.
>
> Regards,
> Krister
>
> [PS. For an alternative solution you could argue as follows:
>
> If you know that there is currently no license for a resource, from a metadata point of view it could perhaps be interpreted as "RES *". In that case, it is not a question of the license having deficient legal metadata (which you wish to indicate by Unspec), but on the contrary, you know quite well that the license is not yet negotiated.
>
> In addition, the asterisk indicates non-standard conditions, i.e. some other condition than the conditions we already have tags for. The need to negotiate is a rather non-standard condition so in that case I guess it is safe to say that "the resource is available to an individual on non-standard conditions (= RES *)".
>
> As the asterisk refers to a real license, this should be acknowledged
> in the license type field, e.g. with an exclamation symbol indicating
> a "not yet negotiated license type", in which case the asterisk refers
> to non-standard conditions in an unknown license so as not to give the
> impression that this is a pre-negotiated CLARIN RES license with
> non-standard conditions.]
>
>
>
> On 19.2.2016 19:03, Durco, Matej wrote:
>> As usual I have to have the last word ;)
>>
>> This is one more request for comment mainly for Krister and Penny:
>>
>> Only while processing the suggested mappings we encountered that Krister and Penny partly mapped to unavailable and undefined.
>> While I think we can safely reduce undefined to unspecified (meaning
>> any information about l/a is missing) it seems that you (Krister, Penny) would like to see it emphasized if the metadata explicitely states that the resource is "not available" at all.
>> However I thought we agreed to have as main categories just the 4 (PUB/ACA/RES/Unspec).
>> So the question is would you propose to have it as a 5th main
>> category, or wouldn't it be enough to map it to RES;Other = RES*?
>> I really wouldn't like to see such a unnice category as "totally unavailable" being pushed into the attention of the users. I would hope that RES* actually covers the case.
>>
>> Thank you for commenting on this.
>>
>> best,
>> Matej
>>
>>
>> -----Ursprüngliche Nachricht-----
>> Von: Twan Goosen [mailto:twan.goosen at mpi.nl]
>> Gesendet: Freitag, 19. Februar 2016 16:13
>> An: Durco, Matej <Matej.Durco at oeaw.ac.at>; Krister Lindén
>> <krister.linden at helsinki.fi>; Penny Labropoulou <penny at ilsp.gr>;
>> 'Sander Maijers' <sander at clarin.eu>
>> Cc: 'Menzo Windhouwer2' <menzo.windhouwer at meertens.knaw.nl>; 'Thomas
>> Eckart' <teckart at informatik.uni-leipzig.de>; Ostojic, Davor
>> <Davor.Ostojic at oeaw.ac.at>; tf-curation at lists.clarin.eu; Sugimoto, Go
>> <Go.Sugimoto at oeaw.ac.at>; 'Dieter Van Uytvanck' <dieter at clarin.eu>
>> Betreff: Re: AW: l/a synthesis
>>
>> Hi everyone,
>>
>> On 12/02/16 13:55, Durco, Matej wrote:
>>> [..]
>>> You will be able to explore the mapping in the usual way on Minerva
>>> beginning next week but also in the new much more intuitive way as Twan has now implemented in the beta release.
>>> The facet will feature only the four main categories for search, but
>>> the secondary categories will be displayed with individual records.
>>> All represented by dedicated icons, giving nice visual clue.
>>> I guess Twan will introduce this in detail next week.
>> We managed to deploy the beta version, just in time to not break Matej's promise ;) You can find it at <http://beta-vlo.clarin.eu/>. Notice that the importer is still running as I'm writing that, so more records will become available between now and ~5pm CET.
>>
>> To guide you a little bit, here are some demonstration queries/pages:
>> * Records open for public or academic access:
>> <http://beta-vlo.clarin.eu/search?fqType=availability:or&fq=availabil
>> i
>> ty:PUB&fq=availability:ACA>
>> * Records that provide their exact licence:
>> <http://beta-vlo.clarin.eu/search?q=license:%7B*+TO+*%5D>
>> * Example of a record that provides a nice amount of legal information:
>> <http://beta-vlo.clarin.eu/record?docId=http_58__47__47_hdl.handle.ne
>> t _47_11858_47_00-203C-0000-0023-8323-1>
>>
>> Two functional TODOs still to be completed before the stable release:
>> - Boosting of public/academic records in search results (this could
>> also reduce the slightly awkward sight of a front page filled with "?"
>> level
>> records)
>> - Addition of missing icons for secondary availability 'laundry tags'
>> (see e.g.
>> <http://beta-vlo.clarin.eu/record?q=availability:FF&docId=http_58__47
>> _
>> _47_urn.fi_47_urn_58_nbn_58_fi_58_lb-201403262>)
>>
>> Please play around with the availability level selector, have a look at the availability/licence information in the search results and on the record pages and the way they are presented. The descriptions shown for the various 'tags' (as tooltips on the search page or shown next to the icons on the record page) can still be improved, and so can the icons and of course the mapping itself. So, to quote Matej:
>>> Thank you again for now
>>> and we are looking forward to your comments and observations on the new release next week.
>> And have a nice weekend :)
>>
>> Best,
>> Twan
>>
>> P.S. you can find a 'complete' list of changes in this version at <https://github.com/clarin-eric/VLO/blob/development/CHANGES.txt> if you're interested.
>>>
>>> Regarding the facet labels and definition, we now decided to stay
>>> with the "availability" label, as it seems to reflect better the user's point of view.
>>> This will be then also actually the only facet with explicit label,
>>> because it will be the among the "search facets". The other facets
>>> will be processed in a more verbose form. You will see next week ;)
>>>
>>>
>>> Best,
>>> Matej
>>>
>>> -----Ursprüngliche Nachricht-----
>>> Von: Krister Lindén [mailto:krister.linden at helsinki.fi]
>>> Gesendet: Freitag, 05. Februar 2016 16:09
>>> An: Penny Labropoulou <penny at ilsp.gr>; Durco, Matej
>>> <Matej.Durco at oeaw.ac.at>; 'Twan Goosen' <twan.goosen at mpi.nl>;
>>> 'Sander Maijers' <sander at clarin.eu>
>>> Cc: 'Menzo Windhouwer2' <menzo.windhouwer at meertens.knaw.nl>; 'Thomas
>>> Eckart' <teckart at informatik.uni-leipzig.de>; Ostojic, Davor
>>> <Davor.Ostojic at oeaw.ac.at>; tf-curation at lists.clarin.eu; Sugimoto,
>>> Go <Go.Sugimoto at oeaw.ac.at>; 'Dieter Van Uytvanck'
>>> <dieter at clarin.eu>
>>> Betreff: Re: l/a synthesis
>>>
>>> Since we are interpreting what others have written, we need to take a conservative view not to give the impression that the data is more freely available than it is. We are currently in the process of interpreting the usage conditions in the metadata by going from "unspecified" to something slightly more informative. We therefore interpret the resources to be as freely accessible as we safely can. If the metadata provider is unhappy with how narrowly his legal metadata is interpreted in the VLO, he can set the elements more exactly in his CMDI metadata.
>>>
>>> (Non-CLARIN Centers have the same opportunity. For a CLARIN Center,
>>> CLARIN ERIC has added responsibility for auditing the quality of the
>>> relationship between the licenses and the CMDI metadata.)
>>>
>>> Penny's advice to be careful is good but it actually works in the other direction based on the principle that one cannot give more rights than one has. Without additional info, "for research" can be narrowly approximated by at least +LRT without saying whether the data is available for all (PUB), for a trusted community (ACA) or upon personal request to the owner (RES). However, using only +LRT would likely keep the resource in the "unspecified" main category, whereas using ACA is a rather safe bet stating that the data is available for research while also assuming that the downloader needs to be identified to access the data, which in most cases is unfortunately still true for research data that is not explicitly licensed with one of the open or public licenses.
>>>
>>> If we have additional knowledge that +ID is not required by the license of the resource, then Penny's suggestion for PUB+research can be narrowly approximated with tags saying that at least PUB+LRT is safe to assume.
>>>
>>> Regards,
>>> Krister
>>>
>>>
>>> On 5.2.2016 11:19, Penny Labropoulou wrote:
>>>> Dear Matej and Krister,
>>>> If indeed the VLO harvests only from CLARIN centres, and following Krister's explanations, ok, let's have the tags explicit.
>>>> But I have the feeling that the VLO also harvests from other sources, and these may not all include CMDI metadata or, even more, a licensing category or even a licence (which should be imposed at least for new data!); if this is the case, then we are actually interpreting the providers' metadata, often just free text statements, in which case we should be more careful, I think. If the providers simply state "for research" and we interpret that as ACA (as done in our excel), then the ID tag may be more than what the original providers ask for; if asked, they might have gone for PUB +research. In any case, the users are directed through the VLO to where the resource itself is made available, and there, the users will have to accept whatever licensing conditions the provider asks for and it's up to the source distributor to enforce it. If the source distributor is a CLARIN centre, the ID will be imposed by our own policy, and it's clearly stated, as Krister says, in the ag!
r!
> ee!
>>> ment tem
>>> plate.
>>>> I would like to see the ID being used as a way of facilitating access to resources for researchers in a trusted federation such as CLARIN, rather than a way of discouraging access to resources.
>>>> As said, if the VLO harvests only from CLARIN centres, just disregard all the above.
>>>> Best,
>>>> Penny
>>>>
>>>> -----Original Message-----
>>>> From: Krister Lindén [mailto:krister.linden at helsinki.fi]
>>>> Sent: Friday, February 05, 2016 3:55 AM
>>>> To: Durco, Matej <Matej.Durco at oeaw.ac.at>; Penny Labropoulou
>>>> <penny at ilsp.gr>; 'Twan Goosen' <twan.goosen at mpi.nl>; 'Sander Maijers'
>>>> <sander at clarin.eu>
>>>> Cc: 'Menzo Windhouwer2' <menzo.windhouwer at meertens.knaw.nl>;
>>>> 'Thomas Eckart' <teckart at informatik.uni-leipzig.de>; Ostojic, Davor
>>>> <Davor.Ostojic at oeaw.ac.at>; tf-curation at lists.clarin.eu; Sugimoto,
>>>> Go <Go.Sugimoto at oeaw.ac.at>; 'Dieter Van Uytvanck'
>>>> <dieter at clarin.eu>
>>>> Subject: Re: l/a synthesis
>>>>
>>>> Dear Matej,
>>>>
>>>> Regarding the explicitness of tags: In the current Agreement templates for a normal ACA resource, the federated login, attribution and no redistribution conditions are made explicit. It would therefore be better to reflect this in the tags in the VLO for this automated update.
>>>>
>>>> Not having the tags explicitly may cause liability for CLARIN in some cases as it encourages unintended usage, whereas being slightly too strict in the labeling will have no legal implications. People will only be pleasantly surprised that some resources are more widely useable than they imagined.
>>>>
>>>> When CLARIN Centers provide their own licenses with tags already marked as CMDI components, they will be responsible for the labeling of their own licenses. If they do not e.g. require login for their particular brand of licenses "for (teaching, education and) research-purpose", they may leave out the ID tag, but if the ID tag is only a non-explicit assumption via the guidelines, the Centers will not even be able to leave it out, as it should always be implicitly read into the tag set.
>>>> (Note that the assessment of the license labeling should be part of
>>>> the regular CLARIN Center assessment procedure.)
>>>>
>>>> The "other" tag is there to draw the attention of the user to peculiar but relevant usage conditions similar to "only to be used on Tuesdays"
>>>> or the like. We can't have a tag for everything, but an asterisk is an indication that this license has conditions out of the ordinary. We are aware that recognizing what is out of the ordinary may be non-trivial.
>>>>
>>>> Regards,
>>>> Krister
>>>>
>>>>
>>>> On 3.2.2016 18:28, Durco, Matej wrote:
>>>>> Dear Krister,
>>>>>
>>>>> thank you a lot for the extensive response, this is really very helpful!
>>>>>
>>>>> In my view, your clarifications regarding NC/ACA and derivative data should definitely find their way into the public information about L/A [1].
>>>>>
>>>>> The hint that the three main categories imply certain subcategories (ACA => ID;BY;NORED) is also very helpful.
>>>>> I just wonder, if we want to make it *explicit* in the VLO (i.e. add for every resource with ACA tag, also the ID, BY and NORED attributes), or just explain it (somewhere under [1]) and leave that implicit in VLO.
>>>>>
>>>>> In the list you proposed to map a few "restricted ..." values with "*"
>>>>> (or other), which seems a bit counterintuitive, but I guess this has to do with the special meaning of "RES"... ?
>>>>>
>>>>> The next steps:
>>>>> We right now process the (Krister's) mapping into a normalization map as used by VLO.
>>>>> We will apply it on our Minerva VLO instance first and let you inspect the new mappings probably on Monday.
>>>>> We will also tentatively try to map from the dc-concepts
>>>>> (dcterms:rights, dcterms:accessRights, dcterms:license), to see if we can get a better coverage (the profile coverage analysis [2] suggests so) After a few days validation and comment period, we would apply the mapping in the main VLO instance (and roll out with version 3.4).
>>>>>
>>>>> There is one more thing, we would like to have feedback on, especially from CLIC. That is the labels and definitions for the l/a related facets.
>>>>> But I spare that for a separate email.
>>>>>
>>>>> Thank you for all the input so far.
>>>>>
>>>>> Best,
>>>>> Matej
>>>>>
>>>>> [1] https://www.clarin.eu/content/license-categories
>>>>> [2]
>>>>> https://docs.google.com/spreadsheets/d/1eeOr0ShOWxdY8BLzp62LDyfGgH
>>>>> o
>>>>> 0
>>>>> g
>>>>> Z
>>>>> 95Myw0qauzLxU/edit#gid=0&vpid=A1
>>>>>
>>>>>
>>>>> -----Ursprüngliche Nachricht-----
>>>>> Von: Krister Lindén [mailto:krister.linden at helsinki.fi]
>>>>> Gesendet: Sonntag, 31. Jänner 2016 22:26
>>>>> An: Durco, Matej <Matej.Durco at oeaw.ac.at>; Penny Labropoulou
>>>>> <penny at ilsp.gr>; 'Twan Goosen' <twan.goosen at mpi.nl>; 'Sander Maijers'
>>>>> <sander at clarin.eu>
>>>>> Cc: 'Menzo Windhouwer2' <menzo.windhouwer at meertens.knaw.nl>;
>>>>> 'Thomas Eckart' <teckart at informatik.uni-leipzig.de>; Ostojic,
>>>>> Davor <Davor.Ostojic at oeaw.ac.at>; tf-curation at lists.clarin.eu;
>>>>> Sugimoto, Go <Go.Sugimoto at oeaw.ac.at>; 'Dieter Van Uytvanck'
>>>>> <dieter at clarin.eu>
>>>>> Betreff: Part I: Re: AW: AW: [Tf-curation] License/Availability
>>>>> was
>>>>> WG: Re: LicenseAvailabilityMap.xml in
>>>>> vlo/trunk/vlo-commons/src/main/resources – CLARIN Trac
>>>>>
>>>>> Dear all,
>>>>>
>>>>> It seems I could have responded in one long email, but I chose to answer in three different parts and therefore I end up confirming some of the things here that Penny said in the next message, while also adding some explanations, but here goes.
>>>>>
>>>>> Part I:
>>>>>
>>>>> On 21.1.2016 17:04, Durco, Matej wrote:
>>>>>> ACA vs. NC
>>>>>> as you rightly commented in the gsheet
>>>>>>
>>>>>> 1)
>>>>>>
>>>>>> * ad: PUB/ACA/RES yes, the goal is to have one of this 3
>>>>>> categories assigned to each record/resource. ... the way it is
>>>>>> solved in META-SHARE profiles brought me to the idea to decompose
>>>>>> to license categories. Namely in META-SHARE profile the
>>>>>> licenceInfo is quite
>>>>>> rich: There is a licence-element and the repeated
>>>>>> restrictionsOfUse element, so each record has more than enough
>>>>>> information to correctly map to both the main categories and the optional ones.
>>>>>> (see example Helsinki corpus [1]) Therefore I believe we can (and
>>>>>> have to) be conservative in the mapping and can avoid adding
>>>>>> uncertain
>>>>>> information: Correct me if I am wrong but "non-commercial use"
>>>>>> can be safely mapped to the License category "NC",
>>>>> Yes it can. There is currently a legal debate going on about what non-commercial means exactly, because it is a fuzzy concept, but whenever that discussion arrives at a conclusion, which may be as soon as the new directive for some kind of research exception emerges, that is the definition we will adopt.
>>>>>
>>>>>> however mapping it to the main category PUB ACA or RES is
>>>>>> problematic without more information. Long story short, even
>>>>>> though we cannot map each individual value in the normalization
>>>>>> map to one of the main categories, in the end each every
>>>>>> record/resource (that provides the appropriate information) will be assigned to one of the three.
>>>>> Agree.
>>>>>
>>>>>> * ad: atomic vs. combined License Categories We want to try with
>>>>>> decomposition, i.e. atomic categories as separate facet values
>>>>>> (BY, SA, ...)
>>>>> Good.
>>>>>
>>>>>> * ad: indication of license/availability information being
>>>>>> unavailable in the dev-instance we use a placeholder "[missing
>>>>>> value]" (actually in all facets), but it needs to be decided if
>>>>>> we want to expose this in the main vlo, especially given the many
>>>>>> records falling into this category. we cannot say "non standard license", because we don't know.
>>>>>> we can only say "unspecified" or synonyms thereof, "unspecified"
>>>>>> being sometimes used as value itself.
>>>>> Unspecified is OK.
>>>>>
>>>>>> 2) ad: C-* facets It's actually the opposite, these are special
>>>>>> facets exposing the values individual concepts that contribute to
>>>>>> the actual availability/license facets. (concept-facet mapping)
>>>>>> The overview of these concepts, incl. definition (copied from the
>>>>>> source) and links to CCR are in the trac-wiki [1]. These C-*
>>>>>> facets are exactly meant to be able to identify, where the
>>>>>> individual values in the availability facet came from.
>>>>>> Identifying the underlying concepts is more-or-less the closest
>>>>>> we can get you (easily), as VLO does not keep the information from which actual profile/element given value comes from.
>>>>>> (this is also in response to 2nd point of 3) ). However one can
>>>>>> get this information in the detail-view (looking into full
>>>>>> metadata record). Not super convenient, but well. And as said the
>>>>>> ProfileName and DataProvider facets help you identify the
>>>>>> provider and profile in question.
>>>>> OK.
>>>>>
>>>>>> ad: licence type vs. license type Yes, indeed there is both a
>>>>>> "licence type" and a "license type" concept. they come from
>>>>>> ISOcat still
>>>>>> (DC-3800 and DC-5439 resp.) I added a snapshot from SMC-browser
>>>>>> to the wiki page [2] showing where these two come from (in which
>>>>>> profiles they are used and what was the context of these profiles.
>>>>>> (also attaching the snapshot) It is 3 and 5 profiles using these,
>>>>>> I guess it would be possible in this case to ask the authors
>>>>>> (with the help from the CCR and CMDI team) to merge these two and
>>>>>> correct the profiles accordingly.
>>>>> Good idea.
>>>>>
>>>>>> Two more points from my side: AFAI understood there is a conflict
>>>>>> in the understanding of PUB/ACA/RES in CLARIN and in META-SHARE,
>>>>>> in META-SHARE everything beyond CC-0 being of availability:restrictedUse.
>>>>>> Is that correct? The example above [1] delivers also the CLARIN
>>>>>> compliant licence (CLARIN_ACA-NC), but I doubt that this is the
>>>>>> case for all META-SHARE records. So in my understanding we need
>>>>>> to disregard the availability information in resourceInfo-profile
>>>>>> and just regard the licence and restrictionsOfUse. Would you agree?
>>>>> I seem to remember that META-SHARE made a point of declaring everything except CC0 restricted, which may be true from a legal point of view, although I don't think they used this for any particular purpose as all the regular CC licenses then also fall into the META-SHARE restricted category.
>>>>>
>>>>> In CLARIN, the RES category was intended to be used for resources "restricted to individual use" typically containing personal data preventing them from being opened to a broader category of users. This is often referred to only as "restricted use" due to the RES acronym and therefore misinterpreted in view of the META-SHARE terminology.
>>>>>
>>>>>> The next question that is not clear to me: - Is NC equivalent
>>>>>> with ACA? Because then we have a problem with CC-NC?
>>>>> No. NC is not equivalent with ACA.
>>>>>
>>>>> In its basic form, ACA means "resources available for educational, teaching and research purposes" including commercial research, so we need NC to specify that an ACA resource is available only for non-commercial purposes.
>>>>>
>>>>> In addition, ACA implies ID i.e. "A user needs to be authenticated
>>>>> or identified." and BY as that is required by law in most EU
>>>>> countries anyway. (This is why there is CC0 to explicitly say that
>>>>> we don't care about attribution.)
>>>>>
>>>>> Authentication implies more than self-identification for collecting usage statistics, so someone needs to verify the identity. For this we need an affiliation to some community that can authenticate the user.
>>>>> We currently offer two flavors of affiliation: EDU and META. If nothing else is mentioned EDU is assumed (which is the pure ACA), but if META is mentioned (by saying ACA+META), we also acknowledged that the META community, which includes industrial partners, may do the authentication. How they do it, is up to them.
>>>>>
>>>>> In contrast to ACA resources, we may also have resources available for any purpose that still require self-identification for collecting usage statistics, e.g. the ip address may be collected or some email address or whatever means of identification the distributor of the resource chooses. This does not restrict access to the resource to a particular community, so we can therefore put such resources in the category PUB+ID.
>>>>>
>>>>> In order to be able to control the ID for authentication, ACA also implies NORED. If the resource could be distributed freely to other researchers, automated authentication could not be implemented and would also not make sense.
>>>>>
>>>>> More generally the following implications hold:
>>>>>
>>>>> ACA => ID;BY;NORED
>>>>> ACA;META => ID;BY
>>>>> RES => ID;BY;NORED
>>>>>
>>>>>> I hope I did not add more confusion.
>>>>> I hope my answers clarified some parts.
>>>>>
>>>>> --
>>>>> Krister
>>>>>
>>>>
>>
>>
>
>
More information about the Tf-curation
mailing list