<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">Hi All,<div class=""><br class=""></div><div class="">Thanks to all of you for all the work you’ve done so far to get TEI processing integrated into WebLicht.</div><div class=""><br class=""></div><div class="">From WebLicht’s side, there are several places where some work/coordination needs to happen:</div><div class=""><br class=""></div><div class="">1. TCF: agree on the textsource.type attribute and make sure that the encoder services set it properly</div><div class="">2. Agree on type names (i.e. text/tei+xml or text/x-tei-dta-xml)</div><div class="">3. Make sure the CMDI for encoder and decoder services reflect outcomes of 1 and 2</div><div class="">4. Add new mappings to WebLicht for TEI.</div><div class=""><br class=""></div><div class="">Steps 1-3 are being worked out here on the mailing list and whichever solution/conventions you agree on are fine with us.</div><div class=""><br class=""></div><div class="">Step 4 requires some changes to the WebLicht code - in particular to the component that we call the “profiler”. When a user uploads a file, the profiler tries to figure out what it is and if any of the WebLicht services can process it. The contentType of the uploaded file, in combination with standard libraries for file type recognition are used for this. But sometimes more digging is necessary, as in the case with tcf - which is recognized as xml, but it needs a closer look to see if it is tcf. The profiler will have to be updated in a similar way to recognize TEI, and hopefully there is even some straightforward way of distinguishing between the DTA and the spoken variants. Finally, mappings need to be established between the results of the profiler and the service input types so that the right services are offered to the user for selection.</div><div class=""><br class=""></div><div class="">Also note that WebLicht chains can be called from the command-line or programmatically using WebLicht as a Service (WaaS) - see instructions here: <a href="https://weblicht.sfs.uni-tuebingen.de/WaaS/" class="">https://weblicht.sfs.uni-tuebingen.de/WaaS/</a> This is useful for larger inputs and avoids timeout issues that arise when using the web interface.</div><div class=""><br class=""></div><div class="">Best Regards,</div><div class="">Marie</div><div class=""><br class=""></div><div class=""><br class=""></div><div><blockquote type="cite" class=""><div class="">On 21.06.2016, at 14:28, Tomaž Erjavec <<a href="mailto:Tomaz.Erjavec@ijs.si" class="">Tomaz.Erjavec@ijs.si</a>> wrote:</div><br class="Apple-interchange-newline"><div class="">
<meta content="text/html; charset=utf-8" http-equiv="Content-Type" class="">
<div bgcolor="#FFFFFF" text="#000000" class=""><p class="">Hi,</p><p class="">as regards <br class="">
</p><p class="">> these format-related specifications (in this case: the name
and possible<br class="">
> values of attributes which are used in addition to a mime
type) would<br class="">
> need to be documented and made known at a central place. <br class="">
</p>
I'd say the documentation for each would need to be accompanied by
its TEI schema, i.e. the TEI ODD file and the derived (probably)
RelaxNG schema. Then it would be a simple matter to check if a
document conforms to the mime type.<br class="">
<br class="">
Best,<br class="">
Tomaž<br class="">
<br class="">
<div class="moz-cite-prefix">Bryan Jurish je 21/06/2016 ob
14:22 napisal:<br class="">
</div>
<blockquote cite="mid:CAMg255yjd-m3qrd-yNoLF+8PLitQNXw8jfe0ohQQ1suzzZ4U7g@mail.gmail.com" type="cite" class="">
<div dir="ltr" class="">morning all,
<div class=""><br class="">
</div>
<div class="">sounds good to me.</div>
<div class=""><br class="">
</div>
<div class="">@Marie: can you give an estimation of how well this might
work for WebLicht?</div>
<div class=""><br class="">
</div>
<div class="">I'll add the "format-variant=tei-dta" parameter to the DTA
TEI<->TCF web service in the next few days, so we can
see how that at least works out.</div>
<div class=""><br class="">
</div>
<div class="">marmosets,</div>
<div class=""> Bryan</div>
</div>
<div class="gmail_extra"><br class="">
<div class="gmail_quote">On Tue, Jun 21, 2016 at 12:32 PM,
Thomas Schmidt <span dir="ltr" class=""><<a moz-do-not-send="true" href="mailto:thomas.schmidt@ids-mannheim.de" target="_blank" class="">thomas.schmidt@ids-mannheim.de</a>></span>
wrote:<br class="">
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">Dear all,<br class="">
<br class="">
revising my suggestions from the teiweblicht list according
to Bryan's<br class="">
proposal to use official mime-types plus parameters (instead
of<br class="">
x-extended custom mime types) would mean that:<br class="">
<br class="">
"text/x-tei-isospoken+xml" could become "text/tei+xml;<br class="">
format-variant=tei-iso-spoken" (+ tokenized=0/1)<br class="">
"text/x-tei-dta+xml" could become "text/tei+xml;<br class="">
format-variant=tei-dta" (+ tokenized=0/1)<br class="">
"text/x-exmaralda-exb+xml" could become "text/xml;
format-variant=exmaralda-exb"<br class="">
... and so forth (for other TEI oder XML based formats)<br class="">
<br class="">
Wouldn't that be a solomonic solution? What do the WebLicht
developers<br class="">
say? And independently of that, I think that Hanna is right
that these<br class="">
format-related specifications (in this case: the name and
possible<br class="">
values of attributes which are used in addition to a mime
type) would<br class="">
need to be documented and made known at a central place. I
guess it<br class="">
would be up to the standards committee to decide on that?<br class="">
<br class="">
Best regards,<br class="">
<br class="">
Thomas<br class="">
<div class="HOEnZb">
<div class="h5"><br class="">
<br class="">
<br class="">
<br class="">
<br class="">
On Sat, Jun 18, 2016 at 10:56 AM, Bryan Jurish <<a moz-do-not-send="true" href="mailto:jurish@bbaw.de" class=""></a><a class="moz-txt-link-abbreviated" href="mailto:jurish@bbaw.de">jurish@bbaw.de</a>>
wrote:<br class="">
> moin all,<br class="">
><br class="">
> fwiw, I agree with Dieter that we need to
differentiate between "proper"<br class="">
> MIME types (i.e. standardized conventions
registered with IANA) and<br class="">
> CLARIN-internal (rsp. WebLicht-internal)
conventions. We have been using<br class="">
> MIME types as the basis of the WebLicht
textSource/@type attribute,<br class="">
> analogous to the HTTP "ContentType" header, cf.<br class="">
> <a moz-do-not-send="true" href="https://tools.ietf.org/html/rfc2045#section-5.1" rel="noreferrer" target="_blank" class="">https://tools.ietf.org/html/rfc2045#section-5.1</a>
. At the risk of repeating<br class="">
> what I've already said on the tei-weblicht list,
use of the ContentType<br class="">
> syntax allows us to have our cake and eat it too:
we can go ahead and use<br class="">
> "official" IANA-sanctioned "true" MIME types and
specify variants<br class="">
> ("dialects", "flavors") using parameters. The DTA
TEI<->TCF converter is<br class="">
> already doing this, setting textSource/@type to
either "text/tei+xml;<br class="">
> tokenized=0" or "text/tei+xml; tokenized=1",
depending on the relevant<br class="">
> properties of the input document.<br class="">
><br class="">
> just my €0.02.<br class="">
><br class="">
> marmosets,<br class="">
> Bryan<br class="">
><br class="">
><br class="">
> On Fri, Jun 17, 2016 at 1:43 PM, Dieter Van
Uytvanck <<a moz-do-not-send="true" href="mailto:dieter@clarin.eu" class="">dieter@clarin.eu</a>><br class="">
> wrote:<br class="">
>><br class="">
>> On 17/06/16 12:59, Sander Maijers wrote:<br class="">
>> > After all, you would want a<br class="">
>> > resource's metadata to be completely
descriptive of such elementary<br class="">
>> > aspects as internal structure and content
of the TEI files, and not<br class="">
>> > dependent on system configuration (served
as custom media type x or y,<br class="">
>> > as long as the server remains so
configured).<br class="">
>><br class="">
>> Hi Sander,<br class="">
>><br class="">
>> Thank you for sharing your opinion.<br class="">
>><br class="">
>> One side note: we are talking about detecting
the mimetype as indicated<br class="">
>> in the CMDI ResourceProxy attribute, see:<br class="">
>><br class="">
>><br class="">
>> <a moz-do-not-send="true" href="https://www.clarin.eu/faq/how-can-i-specify-additional-details-about-resourceproxy" rel="noreferrer" target="_blank" class="">https://www.clarin.eu/faq/how-can-i-specify-additional-details-about-resourceproxy</a><br class="">
>><br class="">
>> So for the scenario VLO -> LR switchboard
-> processing application<br class="">
>><br class="">
>> the system configuration would not be relevant,
since the mimetype is<br class="">
>> explicitly mentioned in the metadata. The key
is to find agreement about<br class="">
>> a simple and light-weight way of designating
the variants of TEI.<br class="">
>><br class="">
>> best,<br class="">
>><br class="">
>> --<br class="">
>> Dieter Van Uytvanck<br class="">
>> Technical Director CLARIN ERIC<br class="">
>> <a moz-do-not-send="true" href="http://www.clarin.eu/" rel="noreferrer" target="_blank" class="">www.clarin.eu</a> | tel. <a moz-do-not-send="true" href="tel:%2B31-%280%29850091363" value="+31850091363" class="">+31-(0)850091363</a>
| skype: dietervu.mpi<br class="">
>> _______________________________________________<br class="">
>> Teiweblicht mailing list<br class="">
>> <a moz-do-not-send="true" href="mailto:Teiweblicht@lists.informatik.uni-leipzig.de" class="">Teiweblicht@lists.informatik.uni-leipzig.de</a><br class="">
>> <a moz-do-not-send="true" href="http://lists.informatik.uni-leipzig.de/mailman/listinfo/teiweblicht" rel="noreferrer" target="_blank" class="">http://lists.informatik.uni-leipzig.de/mailman/listinfo/teiweblicht</a><br class="">
>><br class="">
><br class="">
><br class="">
><br class="">
> --<br class="">
> ***************************************************<br class="">
> Bryan Jurish<br class="">
> Deutsches Textarchiv<br class="">
> Digitales Wörterbuch der deutschen Sprache<br class="">
> Berlin-Brandenburgische Akademie der Wissenschaften<br class="">
><br class="">
> Jägerstr. 22/23<br class="">
> 10117 Berlin<br class="">
><br class="">
> Tel.: <a moz-do-not-send="true" href="tel:%2B49%20%280%2930%2020370%20539" value="+493020370539" class="">+49 (0)30 20370 539</a><br class="">
> E-Mail: <a moz-do-not-send="true" href="mailto:jurish@bbaw.de" class="">jurish@bbaw.de</a><br class="">
> ***************************************************<br class="">
><br class="">
> _______________________________________________<br class="">
> Teiweblicht mailing list<br class="">
> <a moz-do-not-send="true" href="mailto:Teiweblicht@lists.informatik.uni-leipzig.de" class="">Teiweblicht@lists.informatik.uni-leipzig.de</a><br class="">
> <a moz-do-not-send="true" href="http://lists.informatik.uni-leipzig.de/mailman/listinfo/teiweblicht" rel="noreferrer" target="_blank" class="">http://lists.informatik.uni-leipzig.de/mailman/listinfo/teiweblicht</a><br class="">
><br class="">
<br class="">
<br class="">
<br class="">
--<br class="">
</div>
</div>
<div class="HOEnZb">
<div class="h5">Thomas Schmidt<br class="">
IDS Mannheim<br class="">
R5, 6-13<br class="">
D-68161 Mannheim<br class="">
Tel.: <a moz-do-not-send="true" href="tel:%2B49%20%28621%29%201581-313" value="+496211581313" class="">+49 (621) 1581-313</a><br class="">
<a moz-do-not-send="true" href="http://agd.ids-mannheim.de/index.shtml" rel="noreferrer" target="_blank" class="">http://agd.ids-mannheim.de/index.shtml</a><br class="">
<a moz-do-not-send="true" href="http://www.exmaralda.org/" rel="noreferrer" target="_blank" class="">http://www.exmaralda.org</a><br class="">
<br class="">
</div>
</div>
</blockquote>
</div>
<br class="">
<br clear="all" class="">
<div class=""><br class="">
</div>
-- <br class="">
<div class="gmail_signature" data-smartmail="gmail_signature">
<div dir="ltr" class="">***************************************************<br class="">
Bryan Jurish<br class="">
Deutsches Textarchiv
<div class="">Digitales Wörterbuch der deutschen Sprache
<div class="">
<div class="">Berlin-Brandenburgische Akademie der Wissenschaften<br class="">
<br class="">
Jägerstr. 22/23<br class="">
10117 Berlin<br class="">
<br class="">
Tel.: +49 (0)30 20370 539<br class="">
E-Mail: <a moz-do-not-send="true" href="mailto:jurish@bbaw.de" target="_blank" class="">jurish@bbaw.de</a><br class="">
***************************************************</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
<br class="">
</div>
</div></blockquote></div><br class=""></body></html>