<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:Wingdings;
panose-1:5 0 0 0 0 0 0 0 0 0;}
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Times New Roman",serif;
color:black;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
tt
{mso-style-priority:99;
font-family:"Courier New";}
p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph
{mso-style-priority:34;
margin-top:0cm;
margin-right:0cm;
margin-bottom:0cm;
margin-left:36.0pt;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Times New Roman",serif;
color:black;}
span.E-MailFormatvorlage19
{mso-style-type:personal-reply;
font-family:"Calibri",sans-serif;
color:#1F497D;}
span.icon
{mso-style-name:icon;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:612.0pt 792.0pt;
margin:70.85pt 70.85pt 2.0cm 70.85pt;}
div.WordSection1
{page:WordSection1;}
/* List Definitions */
@list l0
{mso-list-id:418406756;
mso-list-type:hybrid;
mso-list-template-ids:-1916533210 134807567 134807577 134807579 134807567 134807577 134807579 134807567 134807577 134807579;}
@list l0:level1
{mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;}
@list l0:level2
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;}
@list l0:level3
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
text-indent:-9.0pt;}
@list l0:level4
{mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;}
@list l0:level5
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;}
@list l0:level6
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
text-indent:-9.0pt;}
@list l0:level7
{mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;}
@list l0:level8
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;}
@list l0:level9
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
text-indent:-9.0pt;}
@list l1
{mso-list-id:1746413099;
mso-list-type:hybrid;
mso-list-template-ids:-161841188 2124441208 134807555 134807557 134807553 134807555 134807557 134807553 134807555 134807557;}
@list l1:level1
{mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;
mso-ascii-font-family:"Times New Roman";
mso-fareast-font-family:"Times New Roman";
mso-hansi-font-family:"Times New Roman";
mso-bidi-font-family:"Times New Roman";}
@list l1:level2
{mso-level-number-format:bullet;
mso-level-text:o;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;
font-family:"Courier New";}
@list l1:level3
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;
font-family:Wingdings;}
@list l1:level4
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;
font-family:Symbol;}
@list l1:level5
{mso-level-number-format:bullet;
mso-level-text:o;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;
font-family:"Courier New";}
@list l1:level6
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;
font-family:Wingdings;}
@list l1:level7
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;
font-family:Symbol;}
@list l1:level8
{mso-level-number-format:bullet;
mso-level-text:o;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;
font-family:"Courier New";}
@list l1:level9
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-18.0pt;
font-family:Wingdings;}
ol
{margin-bottom:0cm;}
ul
{margin-bottom:0cm;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body bgcolor="white" lang="EN-GB" link="blue" vlink="purple">
<div class="WordSection1">
<p class="MsoNormal">Dear all, <o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">I only very late found out that there was a follow-up on the License issue right after the conference (see email below).<o:p></o:p></p>
<p class="MsoNormal">Penny were you able to proceed on that?<o:p></o:p></p>
<p class="MsoNormal">Meanwhile we did experimented quite a bit and compiled information, so here is our current take on this for our (TF Curation / ACDH-OEAW) side:<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">We put down an overview (and would like to collect there more findings and decisions as we go along) in clarin-trac [1]<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Main points:<o:p></o:p></p>
<p class="MsoListParagraph" style="text-indent:-18.0pt;mso-list:l0 level1 lfo2"><![if !supportLists]><span style="mso-list:Ignore">1.<span style="font:7.0pt "Times New Roman"">
</span></span><![endif]>Some of the concepts are linked to both facets (not necessarily bad, but a hint that we don’t have a clear distinction<o:p></o:p></p>
<p class="MsoListParagraph" style="text-indent:-18.0pt;mso-list:l0 level1 lfo2"><![if !supportLists]><span style="mso-list:Ignore">2.<span style="font:7.0pt "Times New Roman"">
</span></span><![endif]>There is a normalisation file employed, which is however incomplete (new unmapped values exist, some of which are however obviously in the completely wrong place (like size in kB) )<o:p></o:p></p>
<p class="MsoListParagraph" style="text-indent:-18.0pt;mso-list:l0 level1 lfo2"><![if !supportLists]><span style="mso-list:Ignore">3.<span style="font:7.0pt "Times New Roman"">
</span></span><![endif]>With current concept-mapping we cover only some 60.000 out of 800.000 records !!!<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Regarding 2: the Normalization<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">The current normalization uses the 3-4 values distinction: Free; Free for academic use; Restricted; Upon request (in line with PUB/ACA/RES – laundry tags)<o:p></o:p></p>
<p class="MsoNormal">This sounds easy, but as far as I could gather, it is problematic (in many ways).<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">In Wroclaw, we discussed with Krister an alternative approach:<o:p></o:p></p>
<p class="MsoNormal">We could try to map to the license categories as they are defined [2] by the Legal Issues Committee and available also in the License Category Calculator [3]. By that we would avoid the problematic reduction, still keeping the “laundry-tag”
approach. And we would be in sync with the Legal committee recommendations. Also each of these atomic tags is well defined and most of them broadly used in the webs.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">We could employ here the decomposition approach, in line with what we try to adopt for resourceType and other facets, that means, we wouldn’t have facet values: [ “PUB”, “PUB+BY”, “PUB+BY+SA”] but rather [“PUB”, “BY”, “SA”].<o:p></o:p></p>
<p class="MsoNormal">Allowing multiple possible values for the facet in each record in combination with the (already implemented) multi-select feature in VLO this should cover for all use cases and be more ergonomic (e.g. if I am interested only in the Non-Commercial
clause, I need to select only one facet value and don’t have to search for all the combination that contain NC.)<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">There is already a <a href="https://github.com/clarin-eric/VLO/blob/master/vlo-commons/src/main/resources/LicenseAvailabilityMap.xml">
<span class="icon"></span>normalisation map used in production</a> [4](committed 2015-04-23). But there are new values that are not mapped yet.
<a href="https://drive.google.com/open?id=1Pf8Jk_P7RaA-7-dj8fcLOKNH5DjprraFEWXgQ3FvVtQ">
<span class="icon"></span>Normalisation map as gsheet</a> [5] with already existing mappings (see normalisation map above) + new values encountered not yet normalized; Values come from elements annotated with concepts linked to one of the two facets License/Availability.<o:p></o:p></p>
<p class="MsoNormal">If we agree on the decomposition approach, this list would need to be reviewed completely, but it’s just around 240 entries.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">And ad 3. Missing values<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Here we have 3 possible situations:<o:p></o:p></p>
<p class="MsoListParagraph" style="text-indent:-18.0pt;mso-list:l1 level1 lfo4"><![if !supportLists]><span style="mso-list:Ignore">1.<span style="font:7.0pt "Times New Roman"">
</span></span><![endif]>Profile does not have any information about licensing/availability (worst case)<o:p></o:p></p>
<p class="MsoListParagraph" style="text-indent:-18.0pt;mso-list:l1 level1 lfo4"><![if !supportLists]><span style="mso-list:Ignore">2.<span style="font:7.0pt "Times New Roman"">
</span></span><![endif]>Profile has information about L/A, but is not linked to a concept, or the concept is not in the facet mapping<o:p></o:p></p>
<p class="MsoListParagraph" style="text-indent:-18.0pt;mso-list:l1 level1 lfo4"><![if !supportLists]><span style="mso-list:Ignore">3.<span style="font:7.0pt "Times New Roman"">
</span></span><![endif]>Profile is well defined, with linking to one of the concepts in the facet mapping, but the information is simply not filled in the record.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">We prepared a list <a href="https://docs.google.com/spreadsheets/d/1eeOr0ShOWxdY8BLzp62LDyfGgHo0gZ95Myw0qauzLxU/edit#gid=0&vpid=A1">
<span class="icon"></span>profile/facet coverage</a> [6] with special considerations of availability and licensing facet. Especially also the individual concepts contributing to the facet are plotted (see the
<tt><span style="font-size:10.0pt">c-*</span></tt> columns).<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">If you want to further investigate this issue, I strongly recommend our experimental instance of the
<a href="https://minerva.arz.oeaw.ac.at/vlo/">VLO on Minerva</a> [7].<o:p></o:p></p>
<p class="MsoNormal">It features normalized and unnormalized facets, explicit [missing values], profileID and name as facets, data provider facet showing the actual data provider, multi-value selection and also special facets for the concepts contributing to
facet availability (i.e. every concept is plotted as a separate facet; these are marked with prefix c-)<o:p></o:p></p>
<p class="MsoNormal">With all this you can only to easily see that the biggest contributor to missing values in availability facet is Meertens [8] (Playing the blame game ;) And you can equally easily see what are the respective profiles (just open the profile
Name facet).<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">So much to our findings until now. We would love to hear from you, what do you think, perhaps we c/should arrange a telco to discuss how to go on about this.<o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal">Best,<o:p></o:p></p>
<p class="MsoNormal">Matej<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">[1] <a href="https://trac.clarin.eu/wiki/Taskforces/Curation/ValueNormalization/License">
https://trac.clarin.eu/wiki/Taskforces/Curation/ValueNormalization/License</a><o:p></o:p></p>
<p class="MsoNormal">[2] https://www.clarin.eu/content/license-categories<o:p></o:p></p>
<p class="MsoNormal">[3] https://www.clarin.eu/content/clarin-license-category-calculator<o:p></o:p></p>
<div>
<p class="MsoNormal">[4] <a href="https://github.com/clarin-eric/VLO/blob/master/vlo-commons/src/main/resources/LicenseAvailabilityMap.xml">
https://github.com/clarin-eric/VLO/blob/master/vlo-commons/src/main/resources/LicenseAvailabilityMap.xml</a><o:p></o:p></p>
<p class="MsoNormal">[5] <a href="https://drive.google.com/open?id=1Pf8Jk_P7RaA-7-dj8fcLOKNH5DjprraFEWXgQ3FvVtQ">
https://drive.google.com/open?id=1Pf8Jk_P7RaA-7-dj8fcLOKNH5DjprraFEWXgQ3FvVtQ</a><o:p></o:p></p>
<p class="MsoNormal">[6] <a href="https://docs.google.com/spreadsheets/d/1eeOr0ShOWxdY8BLzp62LDyfGgHo0gZ95Myw0qauzLxU/edit#gid=0&vpid=A1">
https://docs.google.com/spreadsheets/d/1eeOr0ShOWxdY8BLzp62LDyfGgHo0gZ95Myw0qauzLxU/edit#gid=0&vpid=A1</a><o:p></o:p></p>
<p class="MsoNormal">[7] <a href="https://minerva.arz.oeaw.ac.at/vlo/">https://minerva.arz.oeaw.ac.at/vlo/</a><o:p></o:p></p>
<p class="MsoNormal">[8] <a href="http://minerva.arz.oeaw.ac.at/vlo/search?fq=dataProvider:Meertens_Institute_Metadata_Repository&fq=availability:%5Bmissing+value%5D">
http://minerva.arz.oeaw.ac.at/vlo/search?fq=dataProvider:Meertens_Institute_Metadata_Repository&fq=availability:%5Bmissing+value%5D</a><o:p></o:p></p>
<p class="MsoNormal"><br>
<br>
-------- Forwarded Message -------- <o:p></o:p></p>
<table class="MsoNormalTable" border="0" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td nowrap="" valign="top" style="padding:0cm 0cm 0cm 0cm">
<p class="MsoNormal" align="right" style="text-align:right"><b>Subject: <o:p></o:p></b></p>
</td>
<td style="padding:0cm 0cm 0cm 0cm">
<p class="MsoNormal">Re: LicenseAvailabilityMap.xml in vlo/trunk/vlo-commons/src/main/resources – CLARIN Trac<o:p></o:p></p>
</td>
</tr>
<tr>
<td nowrap="" valign="top" style="padding:0cm 0cm 0cm 0cm">
<p class="MsoNormal" align="right" style="text-align:right"><b>Date: <o:p></o:p></b></p>
</td>
<td style="padding:0cm 0cm 0cm 0cm">
<p class="MsoNormal">Sat, 17 Oct 2015 10:34:04 +0200<o:p></o:p></p>
</td>
</tr>
<tr>
<td nowrap="" valign="top" style="padding:0cm 0cm 0cm 0cm">
<p class="MsoNormal" align="right" style="text-align:right"><b>From: <o:p></o:p></b></p>
</td>
<td style="padding:0cm 0cm 0cm 0cm">
<p class="MsoNormal">Twan Goosen <a href="mailto:twan.goosen@mpi.nl"><twan.goosen@mpi.nl></a><o:p></o:p></p>
</td>
</tr>
<tr>
<td nowrap="" valign="top" style="padding:0cm 0cm 0cm 0cm">
<p class="MsoNormal" align="right" style="text-align:right"><b>To: <o:p></o:p></b></p>
</td>
<td style="padding:0cm 0cm 0cm 0cm">
<p class="MsoNormal">Penny Labropoulou <a href="mailto:penny@ilsp.gr"><penny@ilsp.gr></a><o:p></o:p></p>
</td>
</tr>
<tr>
<td nowrap="" valign="top" style="padding:0cm 0cm 0cm 0cm">
<p class="MsoNormal" align="right" style="text-align:right"><b>CC: <o:p></o:p></b></p>
</td>
<td style="padding:0cm 0cm 0cm 0cm">
<p class="MsoNormal"><span lang="DE-AT">Thomas Eckart </span><a href="mailto:teckart@informatik.uni-leipzig.de"><span lang="DE-AT"><teckart@informatik.uni-leipzig.de></span></a><span lang="DE-AT">, Matej Durco
</span><a href="mailto:xnrn@gmx.net"><span lang="DE-AT"><xnrn@gmx.net></span></a><span lang="DE-AT"><o:p></o:p></span></p>
</td>
</tr>
</tbody>
</table>
<p class="MsoNormal"><span lang="DE-AT"><br>
<br>
</span>That would be great. To get more information on the mapping from the values in resourceInfo records to VLO facets, you can enter the profile id 'clarin.eu:cr1:p_1361876010571' in the input box of the "check profile" form at
<a href="https://lux17.mpi.nl/isocat/clarin/vlo/mapping/index.html"><https://lux17.mpi.nl/isocat/clarin/vlo/mapping/index.html></a>.<br>
<br>
This will give you quite a lot of information but the relevant sections would be<br>
<br>
<o:p></o:p></p>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<p class="MsoNormal">Facet: availability<br>
Matched CMD Element ConceptLink: <a href="http://hdl.handle.net/11459/CCR_C-2457_45bbaa1a-7002-2ecd-ab9d-57a189f694a6">
http://hdl.handle.net/11459/CCR_C-2457_45bbaa1a-7002-2ecd-ab9d-57a189f694a6</a><br>
/c:CMD/c:Components/c:resourceInfo/c:distributionInfo/c:licenceInfo/c:licence/text()<br>
xpath accepted<br>
Matched CMD Element ConceptLink: <a href="http://hdl.handle.net/11459/CCR_C-2453_1f0c3ea5-7966-ae11-d3c6-448424d4e6e8">
http://hdl.handle.net/11459/CCR_C-2453_1f0c3ea5-7966-ae11-d3c6-448424d4e6e8</a><br>
/c:CMD/c:Components/c:resourceInfo/c:distributionInfo/c:licenceInfo/c:restrictionsOfUse/text()<br>
xpath accepted<o:p></o:p></p>
</blockquote>
<p class="MsoNormal">and<br>
<br>
<o:p></o:p></p>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<p class="MsoNormal">Facet: license<br>
Matched CMD Element ConceptLink: <a href="http://hdl.handle.net/11459/CCR_C-2457_45bbaa1a-7002-2ecd-ab9d-57a189f694a6">
http://hdl.handle.net/11459/CCR_C-2457_45bbaa1a-7002-2ecd-ab9d-57a189f694a6</a><br>
/c:CMD/c:Components/c:resourceInfo/c:distributionInfo/c:licenceInfo/c:licence/text()<br>
xpath accepted<br>
Matched CMD Element ConceptLink: <a href="http://hdl.handle.net/11459/CCR_C-2453_1f0c3ea5-7966-ae11-d3c6-448424d4e6e8">
http://hdl.handle.net/11459/CCR_C-2453_1f0c3ea5-7966-ae11-d3c6-448424d4e6e8</a><br>
/c:CMD/c:Components/c:resourceInfo/c:distributionInfo/c:licenceInfo/c:restrictionsOfUse/text()<br>
xpath accepted<o:p></o:p></p>
</blockquote>
<p class="MsoNormal" style="margin-bottom:12.0pt"><br>
So the two fields 'licence' and 'restrictionsOfUse' are mapped to both facets (via the concepts 'availability' and 'license'). By looking at the mapping file we can able to see why this results in the three different availability levels we are now getting in
the VLO (at least in the case of <a href="http://catalog-clarin.esc.rzg.mpg.de/vlo/search?q=perso&fq=country:Finland">
<http://catalog-clarin.esc.rzg.mpg.de/vlo/search?q=perso&fq=country:Finland></a>):<br>
- license 'CLARIN_ACA-NC' maps to 'Free for academic use'<br>
- restriction 'attribution' maps to 'Free'<br>
- restriction 'noRedistribution' maps to 'Restricted'<br>
<br>
The next step is to decide what would be the desired mapping (logic).<br>
<br>
Best,<br>
Twan<o:p></o:p></p>
<div>
<p class="MsoNormal">On 16/10/15 22:14, Penny Labropoulou wrote:<o:p></o:p></p>
</div>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<div>
<div>
<div>
<p class="MsoNormal">No problem! Glad to do it - it was more or less on our agenda for CLIC, so I'll have a look and let you know of the outcomes.<o:p></o:p></p>
</div>
<p class="MsoNormal">Best,<o:p></o:p></p>
</div>
<p class="MsoNormal">Penny<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<p class="MsoNormal">On 16 October 2015 at 16:06, Twan Goosen <<a href="mailto:twan.goosen@mpi.nl" target="_blank">twan.goosen@mpi.nl</a>> wrote:<o:p></o:p></p>
<blockquote style="border:none;border-left:solid #CCCCCC 1.0pt;padding:0cm 0cm 0cm 6.0pt;margin-left:4.8pt;margin-top:5.0pt;margin-right:0cm;margin-bottom:5.0pt">
<p class="MsoNormal">Thanks for your offer to look through this mapping!<br>
I will also send you a link to Menzo's mapping tool.<br>
<br>
<a href="https://trac.clarin.eu/browser/vlo/trunk/vlo-commons/src/main/resources/LicenseAvailabilityMap.xml" target="_blank">https://trac.clarin.eu/browser/vlo/trunk/vlo-commons/src/main/resources/LicenseAvailabilityMap.xml</a>
<o:p></o:p></p>
</blockquote>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
</blockquote>
<p class="MsoNormal" style="margin-bottom:12.0pt"><o:p> </o:p></p>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
</body>
</html>