<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 14 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:Wingdings;
panose-1:5 0 0 0 0 0 0 0 0 0;}
@font-face
{font-family:Wingdings;
panose-1:5 0 0 0 0 0 0 0 0 0;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Tahoma;
panose-1:2 11 6 4 3 5 4 4 2 4;}
@font-face
{font-family:Consolas;
panose-1:2 11 6 9 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Times New Roman","serif";
color:black;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
p
{mso-style-priority:99;
mso-margin-top-alt:auto;
margin-right:0cm;
mso-margin-bottom-alt:auto;
margin-left:0cm;
font-size:12.0pt;
font-family:"Times New Roman","serif";
color:black;}
pre
{mso-style-priority:99;
mso-style-link:"HTML Preformatted Char";
margin:0cm;
margin-bottom:.0001pt;
font-size:10.0pt;
font-family:"Courier New";
color:black;}
span.HTMLPreformattedChar
{mso-style-name:"HTML Preformatted Char";
mso-style-priority:99;
mso-style-link:"HTML Preformatted";
font-family:Consolas;
color:black;}
span.EmailStyle20
{mso-style-type:personal-reply;
font-family:"Calibri","sans-serif";
color:#1F497D;}
span.query-item-label
{mso-style-name:query-item-label;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:612.0pt 792.0pt;
margin:70.85pt 70.85pt 70.85pt 70.85pt;}
div.WordSection1
{page:WordSection1;}
/* List Definitions */
@list l0
{mso-list-id:672875788;
mso-list-template-ids:1567157366;}
@list l0:level1
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:36.0pt;
mso-level-number-position:left;
text-indent:-18.0pt;
mso-ansi-font-size:10.0pt;
font-family:Symbol;}
@list l0:level2
{mso-level-number-format:bullet;
mso-level-text:o;
mso-level-tab-stop:72.0pt;
mso-level-number-position:left;
text-indent:-18.0pt;
mso-ansi-font-size:10.0pt;
font-family:"Courier New";
mso-bidi-font-family:"Times New Roman";}
@list l0:level3
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:108.0pt;
mso-level-number-position:left;
text-indent:-18.0pt;
mso-ansi-font-size:10.0pt;
font-family:Wingdings;}
@list l0:level4
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:144.0pt;
mso-level-number-position:left;
text-indent:-18.0pt;
mso-ansi-font-size:10.0pt;
font-family:Wingdings;}
@list l0:level5
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:180.0pt;
mso-level-number-position:left;
text-indent:-18.0pt;
mso-ansi-font-size:10.0pt;
font-family:Wingdings;}
@list l0:level6
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:216.0pt;
mso-level-number-position:left;
text-indent:-18.0pt;
mso-ansi-font-size:10.0pt;
font-family:Wingdings;}
@list l0:level7
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:252.0pt;
mso-level-number-position:left;
text-indent:-18.0pt;
mso-ansi-font-size:10.0pt;
font-family:Wingdings;}
@list l0:level8
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:288.0pt;
mso-level-number-position:left;
text-indent:-18.0pt;
mso-ansi-font-size:10.0pt;
font-family:Wingdings;}
@list l0:level9
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:324.0pt;
mso-level-number-position:left;
text-indent:-18.0pt;
mso-ansi-font-size:10.0pt;
font-family:Wingdings;}
ol
{margin-bottom:0cm;}
ul
{margin-bottom:0cm;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body bgcolor="white" lang="EN-US" link="blue" vlink="purple">
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Dear Dieter,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">I had a brief look at this. I am a bit hesitant to have them removed them from the VLO.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Though it is true that these records lead to a lot of Dutch-language values for the facet ‘Subject’, I believe that ‘subject’ is not suited as a facet with
a restricted number of values. Currently there are more than 51k possible values, and it is unlikely that this will decrease. Taking out the Academia collection will not really solve this problem. One should see this facet as facet for string search in the
values of a limited number of fields, not really as a facet with restricted values or a small number of values<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">I see many values that are in different languages than English, including German, French, Spanish, Portuguese, Chinese (or Japanese?), codes that are incomprehensible
without a legend (e.g. </span><span class="query-item-label">sh85091588), and some of these are complete phrases or even sentences, so perhaps it is better to reconsider this facet.</span><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">One possible approach, which might somewhat solve the problem you mention, it appears to me, could be to mark all fields from which the subject facet is derived
for the language they are in, and make use of that fact so that users can search for strings in a particular language. Such an addition to the Academia records could be done fully automatically (since all are in Dutch), or it could be added at the curation
level. How difficult it is to do this automatically for metadata from other origins I do not know.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">As to access: If I understood well, access to these resource will be given to all academic organization in the Netherlands soon, without costs. It is still
limited to organisations in the Netherlands, though. But that should not lead to exclusion from the VLO. The VLO contains many descriptions of resources with limitations on access. Having these metadata in might lead researchers from outside the Netherlands
to them, and they can arrange access in an ad-hoc way, if that is crucial to their research. If these metadata would not be in the VLO, these data might not be found at all. After all, we want all researchers to find their data via one entry point: the CLARIN
VLO. <o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Jan<o:p></o:p></span></p>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0cm 0cm 0cm">
<p class="MsoNormal"><b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif";color:windowtext">From:</span></b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif";color:windowtext"> Dieter Van Uytvanck [mailto:dieter@clarin.eu]
<br>
<b>Sent:</b> dinsdag 30 januari 2018 12:15<br>
<b>To:</b> Odijk, J.E.J.M. (Jan); tf-curation@lists.clarin.eu<br>
<b>Subject:</b> curation-related request about the Nederlands Instituut voor Beeld en Geluid Academia collectie<o:p></o:p></span></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<p>Dear Jan,<o:p></o:p></p>
<p>Recently during a meeting on the quality of the metadata in the VLO the <a href="https://vlo.clarin.eu/?fqType=collection:or&fq=collection:Nederlands+Instituut+voor+Beeld+en+Geluid+Academia+collectie">
Beeld en Geluid academia collection</a> was brought up as a source of problematic metadata, mainly because:<o:p></o:p></p>
<ul type="disc">
<li class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;mso-list:l0 level1 lfo1">
the metadata per record is rather sparse and purely in Dutch (<a href="https://vlo.clarin.eu/data/clarin/results/cmdi/Nederands_Instituut_voor_Beeld_en_Geluid_OAI_PMH_repository/oai_beeldengeluid_nl_Expressie_3844948.xml">example</a>), leading to many "noise"
entries for the facet Subject<o:p></o:p></li><li class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;mso-list:l0 level1 lfo1">
access to these resources is limited (to a <a href="https://www.academia.nl/licentiehouders">
subset</a> of the Dutch academic organisations)<o:p></o:p></li></ul>
<p>Since I realize it was not trivial to create all this metadata during the CLARIN-NL project, I was wondering what your opinion on this is. Could it be an option to remove this collection from the VLO?<o:p></o:p></p>
<p>best regards,<o:p></o:p></p>
<pre>-- <o:p></o:p></pre>
<pre>Dieter Van Uytvanck<o:p></o:p></pre>
<pre>Technical Director CLARIN ERIC<o:p></o:p></pre>
<pre><a href="http://www.clarin.eu">www.clarin.eu</a> | tel. +31-(0)850091363 | skype: dietervu.mpi<o:p></o:p></pre>
</div>
</body>
</html>