<div dir="ltr">Dear all,<div><br></div><div>sorry again, this time not about an error, but about a somewhat fundamental oversight: </div><div>TCF, of course, then also needs a mime type which follows the same logic. I suggest:</div><div><br></div><div><span style="font-size:12.8px">text/xml;format-variant=weblicht-tcf</span><br></div><div><span style="font-size:12.8px"><br></span></div><div><span style="font-size:12.8px">I'll use that in the revision of our web services until/unless I hear something to the contrary.</span></div><div><span style="font-size:12.8px"><br></span></div><div><span style="font-size:12.8px">Best,</span></div><div><span style="font-size:12.8px"><br></span></div><div><span style="font-size:12.8px">Thomas</span></div><div><span style="font-size:12.8px"><br></span></div><div><span style="font-size:12.8px"><br></span></div><div><span style="font-size:12.8px"><br></span></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Jul 8, 2016 at 10:25 AM, Thomas Schmidt <span dir="ltr"><<a href="mailto:thomas.schmidt@ids-mannheim.de" target="_blank">thomas.schmidt@ids-mannheim.de</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Sorry: as long as this discussion is the reference document, I should<br>
point out that I made a mistake:<br>
<span class=""><br>
> A parameter "token=0/1" can be added to indicate whether (=1) or<br>
> not (=0) the respective TEI file is tokenized (i.e. has <w> markup)<br>
<br>
</span>The name of the parameter as described by Bryan is "tokenized", not "token".<br>
<br>
- Thomas<br>
<br>
<br>
<br>
On Fri, Jul 8, 2016 at 9:04 AM, Thomas Schmidt<br>
<div class="HOEnZb"><div class="h5"><<a href="mailto:thomas.schmidt@ids-mannheim.de">thomas.schmidt@ids-mannheim.de</a>> wrote:<br>
> Dear all,<br>
><br>
> in the absence of further input from the standards committee and before we<br>
> lose the momentum, I'd like to summarise our action plan according to the<br>
> discussion so far:<br>
><br>
> (1a) In WebLicht (in CLARIN in general?) ISO/TEI transcriptions of spoken<br>
> language will be identified by the MIME type<br>
> text/tei+xml;format-variant=tei-iso-spoken. A parameter "token=0/1" can be<br>
> added to indicate whether (=1) or not (=0) the respective TEI file is<br>
> tokenized (i.e. has <w> markup).<br>
> (1b) HZSK and myself will adapt the respective web services accordingly<br>
><br>
> (2a) In WebLicht (in CLARIN in general?) DTA/TEI files will be identified by<br>
> the MIME type text/tei+xml;format-variant=tei-dta. A parameter "token=0/1"<br>
> can be added to indicate whether (=1) or not (=0) the respective TEI file is<br>
> tokenized (i.e. has <w> markup).<br>
> (2b) Bryan Jurish will adapt the respective web services at BBAW accordingly<br>
><br>
> (3a) In WebLicht (in CLARIN in general?), EXMARaLDA Basic Transcriptions<br>
> will be identified by the MIME type text/xml; format-variant=exmaralda-exb<br>
> (3b) In WebLicht (in CLARIN in general?), FOLKER/OrthoNormal transcription<br>
> files will be identified by the MIME type text/xml;<br>
> format-variant=folker-fln<br>
> (3c) In WebLicht (in CLARIN in general?), Transcriber transcription files<br>
> will be identified by the MIME type text/xml; format-variant=transcriber-trs<br>
> (3d) HZSK and myself will adapt the respective web services accordingly<br>
><br>
> (4a) It would have to be checked (note the passive, I don't know who could<br>
> be in charge of this) whether competing MIME types for these file types are<br>
> already registered somewhere. I know that WebLicht already seems to have two<br>
> variants of EXMARaLDA transcriptions. The mechanims specifying those would<br>
> probably have to be deprecated. Transcriber is also not unlikely to have<br>
> been given some kind of mimetype elsewhere in CLARIN.<br>
> (4b) Further relevant formats will be ELAN/EAF, CLAN/CHA and PRAAT/TextGrid<br>
> (the latter two being text, not XML formats). All three of them are also<br>
> likely to have been registered somewhere already, so "someone" (again, I<br>
> wouldn't know who) should check if mime types have been defined for those.<br>
><br>
> I guess that this is as good an answer as we can currently give to address<br>
> points 1-3 in Marie Hinrich's list. @Marie: can you confirm that this is<br>
> suffient for you, also to address point 4 in your list? In my understanding,<br>
> whatever works for WebLicht in this respect should also be a suitable basis<br>
> for a larger context (the SwitchBoard in particular?).<br>
><br>
> In my eyes, it remains crucial, however, that such standardization<br>
> "decisions" are centrally documented (including the information Tomaž<br>
> suggested). The CLARIN standards pages as they are now<br>
> (<a href="https://www.clarin.eu/content/standard-recommendations" rel="noreferrer" target="_blank">https://www.clarin.eu/content/standard-recommendations</a> /<br>
> <a href="http://clarin.ids-mannheim.de/standards/index.xq" rel="noreferrer" target="_blank">http://clarin.ids-mannheim.de/standards/index.xq</a> are the ones I know) are,<br>
> IMHO, incomplete, inconistent and outdated, and they certainly do not<br>
> provide accurate information on the mime types. Any input from the standard<br>
> committee on this question would therefore still be much appreciated.<br>
><br>
> Best,<br>
><br>
> Thomas<br>
><br>
><br>
><br>
><br>
> On Thu, Jun 23, 2016 at 10:56 AM, Marie Hinrichs<br>
> <<a href="mailto:marie.hinrichs@uni-tuebingen.de">marie.hinrichs@uni-tuebingen.de</a>> wrote:<br>
>><br>
>> Hi All,<br>
>><br>
>> Thanks to all of you for all the work you’ve done so far to get TEI<br>
>> processing integrated into WebLicht.<br>
>><br>
>> From WebLicht’s side, there are several places where some<br>
>> work/coordination needs to happen:<br>
>><br>
>> 1. TCF: agree on the textsource.type attribute and make sure that the<br>
>> encoder services set it properly<br>
>> 2. Agree on type names (i.e. text/tei+xml or text/x-tei-dta-xml)<br>
>> 3. Make sure the CMDI for encoder and decoder services reflect outcomes of<br>
>> 1 and 2<br>
>> 4. Add new mappings to WebLicht for TEI.<br>
>><br>
>> Steps 1-3 are being worked out here on the mailing list and whichever<br>
>> solution/conventions you agree on are fine with us.<br>
>><br>
>> Step 4 requires some changes to the WebLicht code - in particular to the<br>
>> component that we call the “profiler”. When a user uploads a file, the<br>
>> profiler tries to figure out what it is and if any of the WebLicht services<br>
>> can process it. The contentType of the uploaded file, in combination with<br>
>> standard libraries for file type recognition are used for this. But<br>
>> sometimes more digging is necessary, as in the case with tcf - which is<br>
>> recognized as xml, but it needs a closer look to see if it is tcf. The<br>
>> profiler will have to be updated in a similar way to recognize TEI, and<br>
>> hopefully there is even some straightforward way of distinguishing between<br>
>> the DTA and the spoken variants. Finally, mappings need to be established<br>
>> between the results of the profiler and the service input types so that the<br>
>> right services are offered to the user for selection.<br>
>><br>
>> Also note that WebLicht chains can be called from the command-line or<br>
>> programmatically using WebLicht as a Service (WaaS) - see instructions here:<br>
>> <a href="https://weblicht.sfs.uni-tuebingen.de/WaaS/" rel="noreferrer" target="_blank">https://weblicht.sfs.uni-tuebingen.de/WaaS/</a> This is useful for larger inputs<br>
>> and avoids timeout issues that arise when using the web interface.<br>
>><br>
>> Best Regards,<br>
>> Marie<br>
>><br>
>><br>
>> On 21.06.2016, at 14:28, Tomaž Erjavec <<a href="mailto:Tomaz.Erjavec@ijs.si">Tomaz.Erjavec@ijs.si</a>> wrote:<br>
>><br>
>> Hi,<br>
>><br>
>> as regards<br>
>><br>
>> > these format-related specifications (in this case: the name and possible<br>
>> > values of attributes which are used in addition to a mime type) would<br>
>> > need to be documented and made known at a central place.<br>
>><br>
>> I'd say the documentation for each would need to be accompanied by its TEI<br>
>> schema, i.e. the TEI ODD file and the derived (probably) RelaxNG schema.<br>
>> Then it would be a simple matter to check if a document conforms to the mime<br>
>> type.<br>
>><br>
>> Best,<br>
>> Tomaž<br>
>><br>
>> Bryan Jurish je 21/06/2016 ob 14:22 napisal:<br>
>><br>
>> morning all,<br>
>><br>
>> sounds good to me.<br>
>><br>
>> @Marie: can you give an estimation of how well this might work for<br>
>> WebLicht?<br>
>><br>
>> I'll add the "format-variant=tei-dta" parameter to the DTA TEI<->TCF web<br>
>> service in the next few days, so we can see how that at least works out.<br>
>><br>
>> marmosets,<br>
>> Bryan<br>
>><br>
>> On Tue, Jun 21, 2016 at 12:32 PM, Thomas Schmidt<br>
>> <<a href="mailto:thomas.schmidt@ids-mannheim.de">thomas.schmidt@ids-mannheim.de</a>> wrote:<br>
>>><br>
>>> Dear all,<br>
>>><br>
>>> revising my suggestions from the teiweblicht list according to Bryan's<br>
>>> proposal to use official mime-types plus parameters (instead of<br>
>>> x-extended custom mime types) would mean that:<br>
>>><br>
>>> "text/x-tei-isospoken+xml" could become "text/tei+xml;<br>
>>> format-variant=tei-iso-spoken" (+ tokenized=0/1)<br>
>>> "text/x-tei-dta+xml" could become "text/tei+xml;<br>
>>> format-variant=tei-dta" (+ tokenized=0/1)<br>
>>> "text/x-exmaralda-exb+xml" could become "text/xml;<br>
>>> format-variant=exmaralda-exb"<br>
>>> ... and so forth (for other TEI oder XML based formats)<br>
>>><br>
>>> Wouldn't that be a solomonic solution? What do the WebLicht developers<br>
>>> say? And independently of that, I think that Hanna is right that these<br>
>>> format-related specifications (in this case: the name and possible<br>
>>> values of attributes which are used in addition to a mime type) would<br>
>>> need to be documented and made known at a central place. I guess it<br>
>>> would be up to the standards committee to decide on that?<br>
>>><br>
>>> Best regards,<br>
>>><br>
>>> Thomas<br>
>>><br>
>>><br>
>>><br>
>>><br>
>>><br>
>>> On Sat, Jun 18, 2016 at 10:56 AM, Bryan Jurish <<a href="mailto:jurish@bbaw.de">jurish@bbaw.de</a>> wrote:<br>
>>> > moin all,<br>
>>> ><br>
>>> > fwiw, I agree with Dieter that we need to differentiate between<br>
>>> > "proper"<br>
>>> > MIME types (i.e. standardized conventions registered with IANA) and<br>
>>> > CLARIN-internal (rsp. WebLicht-internal) conventions. We have been<br>
>>> > using<br>
>>> > MIME types as the basis of the WebLicht textSource/@type attribute,<br>
>>> > analogous to the HTTP "ContentType" header, cf.<br>
>>> > <a href="https://tools.ietf.org/html/rfc2045#section-5.1" rel="noreferrer" target="_blank">https://tools.ietf.org/html/rfc2045#section-5.1</a> . At the risk of<br>
>>> > repeating<br>
>>> > what I've already said on the tei-weblicht list, use of the ContentType<br>
>>> > syntax allows us to have our cake and eat it too: we can go ahead and<br>
>>> > use<br>
>>> > "official" IANA-sanctioned "true" MIME types and specify variants<br>
>>> > ("dialects", "flavors") using parameters. The DTA TEI<->TCF converter<br>
>>> > is<br>
>>> > already doing this, setting textSource/@type to either "text/tei+xml;<br>
>>> > tokenized=0" or "text/tei+xml; tokenized=1", depending on the relevant<br>
>>> > properties of the input document.<br>
>>> ><br>
>>> > just my €0.02.<br>
>>> ><br>
>>> > marmosets,<br>
>>> > Bryan<br>
>>> ><br>
>>> ><br>
>>> > On Fri, Jun 17, 2016 at 1:43 PM, Dieter Van Uytvanck <<a href="mailto:dieter@clarin.eu">dieter@clarin.eu</a>><br>
>>> > wrote:<br>
>>> >><br>
>>> >> On 17/06/16 12:59, Sander Maijers wrote:<br>
>>> >> > After all, you would want a<br>
>>> >> > resource's metadata to be completely descriptive of such elementary<br>
>>> >> > aspects as internal structure and content of the TEI files, and not<br>
>>> >> > dependent on system configuration (served as custom media type x or<br>
>>> >> > y,<br>
>>> >> > as long as the server remains so configured).<br>
>>> >><br>
>>> >> Hi Sander,<br>
>>> >><br>
>>> >> Thank you for sharing your opinion.<br>
>>> >><br>
>>> >> One side note: we are talking about detecting the mimetype as<br>
>>> >> indicated<br>
>>> >> in the CMDI ResourceProxy attribute, see:<br>
>>> >><br>
>>> >><br>
>>> >><br>
>>> >> <a href="https://www.clarin.eu/faq/how-can-i-specify-additional-details-about-resourceproxy" rel="noreferrer" target="_blank">https://www.clarin.eu/faq/how-can-i-specify-additional-details-about-resourceproxy</a><br>
>>> >><br>
>>> >> So for the scenario VLO -> LR switchboard -> processing application<br>
>>> >><br>
>>> >> the system configuration would not be relevant, since the mimetype is<br>
>>> >> explicitly mentioned in the metadata. The key is to find agreement<br>
>>> >> about<br>
>>> >> a simple and light-weight way of designating the variants of TEI.<br>
>>> >><br>
>>> >> best,<br>
>>> >><br>
>>> >> --<br>
>>> >> Dieter Van Uytvanck<br>
>>> >> Technical Director CLARIN ERIC<br>
>>> >> <a href="http://www.clarin.eu" rel="noreferrer" target="_blank">www.clarin.eu</a> | tel. <a href="tel:%2B31-%280%29850091363" value="+31850091363">+31-(0)850091363</a> | skype: dietervu.mpi<br>
>>> >> _______________________________________________<br>
>>> >> Teiweblicht mailing list<br>
>>> >> <a href="mailto:Teiweblicht@lists.informatik.uni-leipzig.de">Teiweblicht@lists.informatik.uni-leipzig.de</a><br>
>>> >> <a href="http://lists.informatik.uni-leipzig.de/mailman/listinfo/teiweblicht" rel="noreferrer" target="_blank">http://lists.informatik.uni-leipzig.de/mailman/listinfo/teiweblicht</a><br>
>>> >><br>
>>> ><br>
>>> ><br>
>>> ><br>
>>> > --<br>
>>> > ***************************************************<br>
>>> > Bryan Jurish<br>
>>> > Deutsches Textarchiv<br>
>>> > Digitales Wörterbuch der deutschen Sprache<br>
>>> > Berlin-Brandenburgische Akademie der Wissenschaften<br>
>>> ><br>
>>> > Jägerstr. 22/23<br>
>>> > 10117 Berlin<br>
>>> ><br>
>>> > Tel.: <a href="tel:%2B49%20%280%2930%2020370%20539" value="+493020370539">+49 (0)30 20370 539</a><br>
>>> > E-Mail: <a href="mailto:jurish@bbaw.de">jurish@bbaw.de</a><br>
>>> > ***************************************************<br>
>>> ><br>
>>> > _______________________________________________<br>
>>> > Teiweblicht mailing list<br>
>>> > <a href="mailto:Teiweblicht@lists.informatik.uni-leipzig.de">Teiweblicht@lists.informatik.uni-leipzig.de</a><br>
>>> > <a href="http://lists.informatik.uni-leipzig.de/mailman/listinfo/teiweblicht" rel="noreferrer" target="_blank">http://lists.informatik.uni-leipzig.de/mailman/listinfo/teiweblicht</a><br>
>>> ><br>
>>><br>
>>><br>
>>><br>
>>> --<br>
>>> Thomas Schmidt<br>
>>> IDS Mannheim<br>
>>> R5, 6-13<br>
>>> D-68161 Mannheim<br>
>>> Tel.: <a href="tel:%2B49%20%28621%29%201581-313" value="+496211581313">+49 (621) 1581-313</a><br>
>>> <a href="http://agd.ids-mannheim.de/index.shtml" rel="noreferrer" target="_blank">http://agd.ids-mannheim.de/index.shtml</a><br>
>>> <a href="http://www.exmaralda.org" rel="noreferrer" target="_blank">http://www.exmaralda.org</a><br>
>>><br>
>><br>
>><br>
>><br>
>> --<br>
>> ***************************************************<br>
>> Bryan Jurish<br>
>> Deutsches Textarchiv<br>
>> Digitales Wörterbuch der deutschen Sprache<br>
>> Berlin-Brandenburgische Akademie der Wissenschaften<br>
>><br>
>> Jägerstr. 22/23<br>
>> 10117 Berlin<br>
>><br>
>> Tel.: <a href="tel:%2B49%20%280%2930%2020370%20539" value="+493020370539">+49 (0)30 20370 539</a><br>
>> E-Mail: <a href="mailto:jurish@bbaw.de">jurish@bbaw.de</a><br>
>> ***************************************************<br>
>><br>
>><br>
>><br>
><br>
><br>
><br>
> --<br>
> Thomas Schmidt<br>
> IDS Mannheim<br>
> R5, 6-13<br>
> D-68161 Mannheim<br>
> Tel.: <a href="tel:%2B49%20%28621%29%201581-313" value="+496211581313">+49 (621) 1581-313</a><br>
> <a href="http://agd.ids-mannheim.de/index.shtml" rel="noreferrer" target="_blank">http://agd.ids-mannheim.de/index.shtml</a><br>
> <a href="http://www.exmaralda.org" rel="noreferrer" target="_blank">http://www.exmaralda.org</a><br>
<br>
<br>
<br>
--<br>
Thomas Schmidt<br>
IDS Mannheim<br>
R5, 6-13<br>
D-68161 Mannheim<br>
Tel.: <a href="tel:%2B49%20%28621%29%201581-313" value="+496211581313">+49 (621) 1581-313</a><br>
<a href="http://agd.ids-mannheim.de/index.shtml" rel="noreferrer" target="_blank">http://agd.ids-mannheim.de/index.shtml</a><br>
<a href="http://www.exmaralda.org" rel="noreferrer" target="_blank">http://www.exmaralda.org</a><br>
</div></div></blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div>Thomas Schmidt<br>IDS Mannheim<br>R5, 6-13<br>D-68161 Mannheim<br>Tel.: +49 (621) 1581-313<br><a href="http://agd.ids-mannheim.de/index.shtml" target="_blank">http://agd.ids-mannheim.de/index.shtml</a><br><a href="http://www.exmaralda.org" target="_blank">http://www.exmaralda.org</a></div></div></div>
</div>