<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-2">
<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40"><head><meta name=Generator content="Microsoft Word 14 (filtered medium)"><style><!--
/* Font Definitions */
@font-face
{font-family:Wingdings;
panose-1:5 0 0 0 0 0 0 0 0 0;}
@font-face
{font-family:Wingdings;
panose-1:5 0 0 0 0 0 0 0 0 0;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Tahoma;
panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Times New Roman","serif";}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
p
{mso-style-priority:99;
margin:0cm;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Times New Roman","serif";}
span.EmailStyle19
{mso-style-type:personal-reply;
font-family:"Calibri","sans-serif";
color:#2E74B5;
font-weight:normal;
font-style:normal;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:612.0pt 792.0pt;
margin:72.0pt 90.0pt 72.0pt 90.0pt;}
div.WordSection1
{page:WordSection1;}
/* List Definitions */
@list l0
{mso-list-id:1630356639;
mso-list-template-ids:-1572172626;}
@list l0:level1
{mso-level-number-format:bullet;
mso-level-text:\F0B7;
mso-level-tab-stop:36.0pt;
mso-level-number-position:left;
text-indent:-18.0pt;
mso-ansi-font-size:10.0pt;
font-family:Symbol;}
@list l0:level2
{mso-level-number-format:bullet;
mso-level-text:o;
mso-level-tab-stop:72.0pt;
mso-level-number-position:left;
text-indent:-18.0pt;
mso-ansi-font-size:10.0pt;
font-family:"Courier New";
mso-bidi-font-family:"Times New Roman";}
@list l0:level3
{mso-level-number-format:bullet;
mso-level-text:\F0A7;
mso-level-tab-stop:108.0pt;
mso-level-number-position:left;
text-indent:-18.0pt;
mso-ansi-font-size:10.0pt;
font-family:Wingdings;}
@list l0:level4
{mso-level-number-format:bullet;
mso-level-text:\F0A7;
mso-level-tab-stop:144.0pt;
mso-level-number-position:left;
text-indent:-18.0pt;
mso-ansi-font-size:10.0pt;
font-family:Wingdings;}
@list l0:level5
{mso-level-number-format:bullet;
mso-level-text:\F0A7;
mso-level-tab-stop:180.0pt;
mso-level-number-position:left;
text-indent:-18.0pt;
mso-ansi-font-size:10.0pt;
font-family:Wingdings;}
@list l0:level6
{mso-level-number-format:bullet;
mso-level-text:\F0A7;
mso-level-tab-stop:216.0pt;
mso-level-number-position:left;
text-indent:-18.0pt;
mso-ansi-font-size:10.0pt;
font-family:Wingdings;}
@list l0:level7
{mso-level-number-format:bullet;
mso-level-text:\F0A7;
mso-level-tab-stop:252.0pt;
mso-level-number-position:left;
text-indent:-18.0pt;
mso-ansi-font-size:10.0pt;
font-family:Wingdings;}
@list l0:level8
{mso-level-number-format:bullet;
mso-level-text:\F0A7;
mso-level-tab-stop:288.0pt;
mso-level-number-position:left;
text-indent:-18.0pt;
mso-ansi-font-size:10.0pt;
font-family:Wingdings;}
@list l0:level9
{mso-level-number-format:bullet;
mso-level-text:\F0A7;
mso-level-tab-stop:324.0pt;
mso-level-number-position:left;
text-indent:-18.0pt;
mso-ansi-font-size:10.0pt;
font-family:Wingdings;}
ol
{margin-bottom:0cm;}
ul
{margin-bottom:0cm;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]--></head><body bgcolor=white lang=EL link=blue vlink=purple><div class=WordSection1><p class=MsoNormal><span lang=EN-US style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#2E74B5'>Dear Jakob, dear all,<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#2E74B5'>Thank you for the very useful overview of social media resources. <o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#2E74B5'>We have filled in the missing information on the Greek social media listed, directly on the spreadsheet (row 10). <o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#2E74B5'>I have also asked some colleagues if they are aware of any resources of the types described in your mail; I’ll forward any answers I get.<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#2E74B5'>Best regards,<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#2E74B5'>Maria<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#2E74B5'> <o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#2E74B5'><o:p> </o:p></span></p><p class=MsoNormal><span lang=EN-US style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#2E74B5'><o:p> </o:p></span></p><div><p class=MsoNormal><span lang=EN-US style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#2E74B5'>Maria Gavrilidou<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#2E74B5'>ILSP/R.C. ‘Athena’<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#2E74B5'>Epidavrou & Artemidos 6<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#2E74B5'>GR-15125 Marousi<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#2E74B5'>Athens<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#2E74B5'>Greece<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#2E74B5'>Tel.: +30 210 6875441<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#2E74B5'>Email: <a href="mailto:maria@ilsp.athena-innovation.gr"><span style='color:#0563C1'>maria@ilsp.athena-innovation.gr</span></a> <o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#2E74B5'>URL: <a href="http://www.ilsp.gr/"><span style='color:#0563C1'>www.ilsp.gr</span></a><o:p></o:p></span></p></div><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#2E74B5'><o:p> </o:p></span></p><div><div style='border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0cm 0cm 0cm'><p class=MsoNormal><b><span lang=EN-US style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'>From:</span></b><span lang=EN-US style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'> userinvolvement-bounces@lists.clarin.eu [mailto:userinvolvement-bounces@lists.clarin.eu] <b>On Behalf Of </b>Lenardic, Jakob<br><b>Sent:</b> Wednesday, April 26, 2017 10:51 AM<br><b>To:</b> userinvolvement@lists.clarin.eu<br><b>Subject:</b> [Userinvolvement] REQUEST – help on compiling overview of social media corpora<o:p></o:p></span></p></div></div><p class=MsoNormal><o:p> </o:p></p><p style='margin-bottom:12.0pt;text-align:justify'><span style='font-family:"Calibri","sans-serif";color:black'>Dear all,<o:p></o:p></span></p><p style='margin-bottom:12.0pt;text-align:justify'><span style='font-family:"Calibri","sans-serif";color:black'>Darja and I have been working on an overview of corpora containing data from social media platforms (e.g. Twitter, Facebook, blogs, fora, etc.) available in CLARIN member countries. We are doing this in light of the forthcoming <a href="https://www.clarin.eu/event/2017/clarin-plus-workshop-creation-and-use-social-media-resources" target="_blank">CLARIN-PLUS workshop on the data of social media that will be held on 18 and 19 May in Kaunas, Lithuania</a>. <o:p></o:p></span></p><p style='text-align:justify'><span style='font-family:"Calibri","sans-serif";color:black'>We are interested in identifying three types of resources:<o:p></o:p></span></p><blockquote style='margin-top:5.0pt;margin-bottom:5.0pt'><p style='margin-bottom:12.0pt;text-align:justify'><span style='font-family:"Calibri","sans-serif";color:black'>1) corpora of Social Media data that can be used for various kinds of linguistic analyses, such as the <a href="http://metashare.csc.fi/repository/browse/the-suomi-24-corpus-2016h2/eb323320f44d11e6b70e005056be118e30dc4e74e4654a4a8b3e8789ef31c0d0/" target="_blank">Finnish Suomi 24 Corpus</a>, and<o:p></o:p></span></p><p style='margin-bottom:12.0pt;text-align:justify'><span style='font-family:"Calibri","sans-serif";color:black'>2) smaller, specialized datasets for particular NLP tasks, such as <a href="https://www.clarin.si/repository/xmlui/handle/11356/1085" target="_blank">CMC training corpus Janes-Tag 1.2</a>.<o:p></o:p></span></p><p style='text-align:justify'><span style='font-family:"Calibri","sans-serif";color:black'>3) NLP tools adapted or developed for (noisy) social media language, such as <a href="https://github.com/clarinsi/csmtiser" target="_blank">csmtiser</a>, which is a tool for text normalisation via character-level machine translation developed by CLARIN.SI members.<o:p></o:p></span></p></blockquote><p style='text-align:justify'><span style='font-family:"Calibri","sans-serif";color:black'>In terms of the metadata, we are looking for the following information:<o:p></o:p></span></p><ul type=disc><li class=MsoNormal style='color:black;mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;mso-list:l0 level1 lfo1'><span style='font-family:"Calibri","sans-serif"'>Language(s)<o:p></o:p></span></li><li class=MsoNormal style='color:black;mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;mso-list:l0 level1 lfo1'><span style='font-family:"Calibri","sans-serif"'>Size (in tokens)<o:p></o:p></span></li><li class=MsoNormal style='color:black;mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;mso-list:l0 level1 lfo1'><span style='font-family:"Calibri","sans-serif"'>Period (from-to)<o:p></o:p></span></li><li class=MsoNormal style='color:black;mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;mso-list:l0 level1 lfo1'><span style='font-family:"Calibri","sans-serif"'>Annotation & tools<o:p></o:p></span></li><li class=MsoNormal style='color:black;mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;mso-list:l0 level1 lfo1'><span style='font-family:"Calibri","sans-serif"'>Availability<o:p></o:p></span></li><li class=MsoNormal style='color:black;mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;mso-list:l0 level1 lfo1'><span style='font-family:"Calibri","sans-serif"'>License<o:p></o:p></span></li><li class=MsoNormal style='color:black;mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;mso-list:l0 level1 lfo1'><span style='font-family:"Calibri","sans-serif"'>Key publication<o:p></o:p></span></li></ul><p style='margin-bottom:12.0pt;text-align:justify'><span style='font-family:"Calibri","sans-serif";color:black'>The results of our preliminary investigation can be seen in the <a href="https://docs.google.com/spreadsheets/d/1sbTvCTjmkXFjVfA2kOUoj1NRDm48R7UiLmabIjHLMRQ/edit?usp=sharing">Google spreadsheet</a>. As you can see, we haven’t been to find relevant corpora/datasets/tools for Bulgaria, Denmark, Lithuania, Latvia, Portugal and Hungary. For several of the corpora/datasets/tools that we have identified some metadata are incomplete. Finally, there might exist corpora/datasets/tools we are not yet aware of but would be grateful to learn about them.<o:p></o:p></span></p><p style='margin-bottom:12.0pt;text-align:justify'><span style='font-family:"Calibri","sans-serif";color:black'>For this reason, I would kindly like to invite you to fill in the missing data on behalf of your consortium in the spreadsheet, or send me the missing information by email if that’s easier for you. I am looking forward to your contributions by <strong><span style='font-family:"Calibri","sans-serif"'>8 May.</span></strong><o:p></o:p></span></p><p style='text-align:justify'><span style='font-family:"Calibri","sans-serif";color:black'>Best,<o:p></o:p></span></p><p style='text-align:justify'><span style='font-family:"Calibri","sans-serif";color:black'>Jakob<o:p></o:p></span></p></div></body></html>