<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><meta http-equiv="Content-Type" content="text/html; charset=utf-8" class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">Hi,<div class=""><br class=""></div><div class="">a more generic question to this topic: is it a CLARIN recommendation for all records to have English metadata? I don’t mean not having localised metadata too, but always having English metadata as a baseline? Does some more high-level recommendation (like RDA) exist for this topic of localisation?</div><div class=""><br class=""></div><div class="">Because I think </div><div class="">- it would solve most of our “multilingual” problems</div><div class="">- I like Jan’s argument about researchers from abroad finding the data, except I definitely wouldn’t. And still lot of data can be useful, even when one can’t speak the language.</div><div class=""><br class=""></div><div class="">Pavel</div><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><br class=""><div class=""><br class=""><blockquote type="cite" class=""><div class="">On 7 Mar 2018, at 10:59, Odijk, J.E.J.M. (Jan) <<a href="mailto:j.odijk@uu.nl" class="">j.odijk@uu.nl</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div class="WordSection1" style="page: WordSection1; font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255);"><div style="margin: 0cm 0cm 0.0001pt; font-size: 12pt; font-family: "Times New Roman", serif;" class=""><span style="font-size: 11pt; font-family: Calibri, sans-serif; color: rgb(31, 73, 125);" class="">Dear Dieter,<o:p class=""></o:p></span></div><div style="margin: 0cm 0cm 0.0001pt; font-size: 12pt; font-family: "Times New Roman", serif;" class=""><span style="font-size: 11pt; font-family: Calibri, sans-serif; color: rgb(31, 73, 125);" class=""><o:p class=""> </o:p></span></div><div style="margin: 0cm 0cm 0.0001pt; font-size: 12pt; font-family: "Times New Roman", serif;" class=""><span style="font-size: 11pt; font-family: Calibri, sans-serif; color: rgb(31, 73, 125);" class="">I had a brief look at this. I am a bit hesitant to have them removed them from the VLO.<o:p class=""></o:p></span></div><div style="margin: 0cm 0cm 0.0001pt; font-size: 12pt; font-family: "Times New Roman", serif;" class=""><span style="font-size: 11pt; font-family: Calibri, sans-serif; color: rgb(31, 73, 125);" class=""><o:p class=""> </o:p></span></div><div style="margin: 0cm 0cm 0.0001pt; font-size: 12pt; font-family: "Times New Roman", serif;" class=""><span style="font-size: 11pt; font-family: Calibri, sans-serif; color: rgb(31, 73, 125);" class="">Though it is true that these records lead to a lot of Dutch-language values for the facet ‘Subject’, I believe that ‘subject’ is not suited as a facet with a restricted number of values. Currently there are more than 51k possible values, and it is unlikely that this will decrease. Taking out the Academia collection will not really solve this problem. One should see this facet as facet for string search in the values of a limited number of fields, not really as a facet with restricted values or a small number of values<o:p class=""></o:p></span></div><div style="margin: 0cm 0cm 0.0001pt; font-size: 12pt; font-family: "Times New Roman", serif;" class=""><span style="font-size: 11pt; font-family: Calibri, sans-serif; color: rgb(31, 73, 125);" class=""><o:p class=""> </o:p></span></div><div style="margin: 0cm 0cm 0.0001pt; font-size: 12pt; font-family: "Times New Roman", serif;" class=""><span style="font-size: 11pt; font-family: Calibri, sans-serif; color: rgb(31, 73, 125);" class="">I see many values that are in different languages than English, including German, French, Spanish, Portuguese, Chinese (or Japanese?), codes that are incomprehensible without a legend (e.g.<span class="Apple-converted-space"> </span></span><span class="query-item-label">sh85091588), and some of these are complete phrases or even sentences, so perhaps it is better to reconsider this facet.</span><span style="font-size: 11pt; font-family: Calibri, sans-serif; color: rgb(31, 73, 125);" class=""><o:p class=""></o:p></span></div><div style="margin: 0cm 0cm 0.0001pt; font-size: 12pt; font-family: "Times New Roman", serif;" class=""><span style="font-size: 11pt; font-family: Calibri, sans-serif; color: rgb(31, 73, 125);" class=""><o:p class=""> </o:p></span></div><div style="margin: 0cm 0cm 0.0001pt; font-size: 12pt; font-family: "Times New Roman", serif;" class=""><span style="font-size: 11pt; font-family: Calibri, sans-serif; color: rgb(31, 73, 125);" class="">One possible approach, which might somewhat solve the problem you mention, it appears to me, could be to mark all fields from which the subject facet is derived for the language they are in, and make use of that fact so that users can search for strings in a particular language. Such an addition to the Academia records could be done fully automatically (since all are in Dutch), or it could be added at the curation level. How difficult it is to do this automatically for metadata from other origins I do not know.<o:p class=""></o:p></span></div><div style="margin: 0cm 0cm 0.0001pt; font-size: 12pt; font-family: "Times New Roman", serif;" class=""><span style="font-size: 11pt; font-family: Calibri, sans-serif; color: rgb(31, 73, 125);" class=""><o:p class=""> </o:p></span></div><div style="margin: 0cm 0cm 0.0001pt; font-size: 12pt; font-family: "Times New Roman", serif;" class=""><span style="font-size: 11pt; font-family: Calibri, sans-serif; color: rgb(31, 73, 125);" class="">As to access: If I understood well, access to these resource will be given to all academic organization in the Netherlands soon, without costs. It is still limited to organisations in the Netherlands, though. But that should not lead to exclusion from the VLO. The VLO contains many descriptions of resources with limitations on access. Having these metadata in might lead researchers from outside the Netherlands to them, and they can arrange access in an ad-hoc way, if that is crucial to their research. If these metadata would not be in the VLO, these data might not be found at all. After all, we want all researchers to find their data via one entry point: the CLARIN VLO.<span class="Apple-converted-space"> </span><o:p class=""></o:p></span></div><div style="margin: 0cm 0cm 0.0001pt; font-size: 12pt; font-family: "Times New Roman", serif;" class=""><span style="font-size: 11pt; font-family: Calibri, sans-serif; color: rgb(31, 73, 125);" class=""><o:p class=""> </o:p></span></div><div style="margin: 0cm 0cm 0.0001pt; font-size: 12pt; font-family: "Times New Roman", serif;" class=""><span style="font-size: 11pt; font-family: Calibri, sans-serif; color: rgb(31, 73, 125);" class="">Jan<o:p class=""></o:p></span></div><div style="margin: 0cm 0cm 0.0001pt; font-size: 12pt; font-family: "Times New Roman", serif;" class=""><o:p class=""> </o:p></div><div class=""><div style="border-style: solid none none; border-top-width: 1pt; border-top-color: rgb(181, 196, 223); padding: 3pt 0cm 0cm;" class=""><div style="margin: 0cm 0cm 0.0001pt; font-size: 12pt; font-family: "Times New Roman", serif;" class=""><b class=""><span style="font-size: 10pt; font-family: Tahoma, sans-serif; color: windowtext;" class="">From:</span></b><span style="font-size: 10pt; font-family: Tahoma, sans-serif; color: windowtext;" class=""><span class="Apple-converted-space"> </span>Dieter Van Uytvanck [<a href="mailto:dieter@clarin.eu" class="">mailto:dieter@clarin.eu</a>]<span class="Apple-converted-space"> </span><br class=""><b class="">Sent:</b><span class="Apple-converted-space"> </span>dinsdag 30 januari 2018 12:15<br class=""><b class="">To:</b><span class="Apple-converted-space"> </span>Odijk, J.E.J.M. (Jan); <a href="mailto:tf-curation@lists.clarin.eu" class="">tf-curation@lists.clarin.eu</a><br class=""><b class="">Subject:</b><span class="Apple-converted-space"> </span>curation-related request about the Nederlands Instituut voor Beeld en Geluid Academia collectie<o:p class=""></o:p></span></div></div></div><div style="margin: 0cm 0cm 0.0001pt; font-size: 12pt; font-family: "Times New Roman", serif;" class=""><o:p class=""> </o:p></div><p style="margin-right: 0cm; margin-left: 0cm; font-size: 12pt; font-family: "Times New Roman", serif;" class="">Dear Jan,<o:p class=""></o:p></p><p style="margin-right: 0cm; margin-left: 0cm; font-size: 12pt; font-family: "Times New Roman", serif;" class="">Recently during a meeting on the quality of the metadata in the VLO the<span class="Apple-converted-space"> </span><a href="https://vlo.clarin.eu/?fqType=collection:or&fq=collection:Nederlands+Instituut+voor+Beeld+en+Geluid+Academia+collectie" style="color: purple; text-decoration: underline;" class="">Beeld en Geluid academia collection</a><span class="Apple-converted-space"> </span>was brought up as a source of problematic metadata, mainly because:<o:p class=""></o:p></p><ul type="disc" style="margin-bottom: 0cm;" class=""><li class="MsoNormal" style="margin: 0cm 0cm 0.0001pt; font-size: 12pt; font-family: "Times New Roman", serif;">the metadata per record is rather sparse and purely in Dutch (<a href="https://vlo.clarin.eu/data/clarin/results/cmdi/Nederands_Instituut_voor_Beeld_en_Geluid_OAI_PMH_repository/oai_beeldengeluid_nl_Expressie_3844948.xml" style="color: purple; text-decoration: underline;" class="">example</a>), leading to many "noise" entries for the facet Subject<o:p class=""></o:p></li><li class="MsoNormal" style="margin: 0cm 0cm 0.0001pt; font-size: 12pt; font-family: "Times New Roman", serif;">access to these resources is limited (to a<span class="Apple-converted-space"> </span><a href="https://www.academia.nl/licentiehouders" style="color: purple; text-decoration: underline;" class="">subset</a><span class="Apple-converted-space"> </span>of the Dutch academic organisations)<o:p class=""></o:p></li></ul><p style="margin-right: 0cm; margin-left: 0cm; font-size: 12pt; font-family: "Times New Roman", serif;" class="">Since I realize it was not trivial to create all this metadata during the CLARIN-NL project, I was wondering what your opinion on this is. Could it be an option to remove this collection from the VLO?<o:p class=""></o:p></p><p style="margin-right: 0cm; margin-left: 0cm; font-size: 12pt; font-family: "Times New Roman", serif;" class="">best regards,<o:p class=""></o:p></p><pre style="margin: 0cm 0cm 0.0001pt; font-size: 10pt; font-family: "Courier New";" class="">-- <o:p class=""></o:p></pre><pre style="margin: 0cm 0cm 0.0001pt; font-size: 10pt; font-family: "Courier New";" class="">Dieter Van Uytvanck<o:p class=""></o:p></pre><pre style="margin: 0cm 0cm 0.0001pt; font-size: 10pt; font-family: "Courier New";" class="">Technical Director CLARIN ERIC<o:p class=""></o:p></pre><pre style="margin: 0cm 0cm 0.0001pt; font-size: 10pt; font-family: "Courier New";" class=""><a href="http://www.clarin.eu/" style="color: purple; text-decoration: underline;" class="">www.clarin.eu</a> | tel. +31-(0)850091363 | skype: dietervu.mpi<o:p class=""></o:p></pre></div><span style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); float: none; display: inline !important;" class="">_______________________________________________</span><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255);" class=""><span style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); float: none; display: inline !important;" class="">Tf-curation mailing list</span><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255);" class=""><span style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); float: none; display: inline !important;" class=""><a href="mailto:Tf-curation@lists.clarin.eu" class="">Tf-curation@lists.clarin.eu</a></span><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255);" class=""><span style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); float: none; display: inline !important;" class=""><a href="https://lists.clarin.eu/cgi-bin/mailman/listinfo/tf-curation" class="">https://lists.clarin.eu/cgi-bin/mailman/listinfo/tf-curation</a></span><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255);" class=""></div></blockquote></div><br class=""></div></div></body></html>