<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">Dear Thomas and All,<br>
<br>
[I'm a bit afraid that my message may bounce off the teiweblicht
list, since I'm not a member, but let me try and count on the list
administrators' "ok" for the attempt]<br>
<br>
It's taken me a while to go through this thread together with its
branches into the Net via the URLs quoted. I'm taken up by the
format-variant solution, it indeed seems the neatest of all for
the purpose of processing. I've been dusting off some old TEI
tickets in the meantime, to see how I could suggest a
formalization of this on the TEI side, but this is of course a
slightly different issue and task. A very new one, too, from what
I can see, because indeed, apart from the quick round of applause
after the introduction of MIME type, nothing much followed, at
least in the official channels.<br>
<br>
I do not presume the authority to speak on behalf of the Standards
Committee in this respect. In my own view, let me repeat, the
solution seems brilliant. As far as the Standards Committee is
concerned, and with apologies to those who have seen this message
in the Standards mailing list, I would like to repeat my
declaration of providing a proposal for a unified set of standards
documents that could be advocated by CLARIN centres (and again I
hasten to stress that this is not meant as a revolution, but
rather as taking stock of the inventory and seeing what got
obsolete beyond embarrassment and what has appeared on the scene
in the time since the Short Guides and other proposals for
standardization within CLARIN were created; Jan Odijk has in the
meantime approached the issue from a slightly different but
closely related angle, which gives me a good starting point, and I
am happy to have received some important backchannel support for
this initiative as well). Andreas Witt and I will present the
proposal (we sometimes cautiously speak about it as a sketch) in
Aix-en-Provence, and circulate it earlier among the Standards
group and other interested parties. Naturally, we count on the
support and advice of the Centres Committee in this endeavour.<br>
<br>
Also, the process of standardization is not about putting a stamp
of one committee or another over a proposal, but rather it
consists in recognition and promotion of existing good practices,
so I would say, let's put this idea into practice and see if it
works (I guess we're all pretty optimistic about that), and I will
be happy to document it as a working practice that can constitute
the basis for standardization. And while I do that I'll keep the
TEI part of my brain and life in sync with that -- this is a
pretty fortunate moment to speak of this, given the approaching
TEI meeting, because it gives an opportunity to seed the TEI
Technical Council's consciousness with these ideas, and hope that
they ripen enough by the next release cycle to get reflected in
the TEI documents as well.<br>
<br>
It seems like some exciting weeks may lie ahead. In the meantime,
I wish everyone a good weekend (and some of us a good show on
Sunday ;-)).<br>
<br>
Best regards,<br>
<br>
Piotr Banski<br>
<br>
<br>
On 08/07/16 09:04, Thomas Schmidt wrote:<br>
</div>
<blockquote
cite="mid:CAD74COkhyq3At+++_7b5mbv4dw0AnopszfWc-CcB0AApE3M+Jg@mail.gmail.com"
type="cite">
<div dir="ltr">Dear all,
<div><br>
</div>
<div>in the absence of further input from the standards
committee and before we lose the momentum, I'd like to
summarise our action plan according to the discussion so far:</div>
<div><br>
</div>
<div>(1a) In WebLicht (in CLARIN in general?) ISO/TEI
transcriptions of spoken language will be identified by the
MIME type <span style="font-size:12.8px">text/tei+xml;</span><span
style="font-size:12.8px">format-variant=tei-iso-spoken. A
parameter "token=0/1" can be added to indicate whether (=1)
or not (=0) the respective TEI file is tokenized (i.e. has
<w> markup).</span></div>
<div>
<div>(1b) HZSK and myself will <span style="font-size:12.8px">adapt
the respective web services accordingly</span><br>
</div>
<div><br>
</div>
<div>(2a) In WebLicht (in CLARIN in general?) DTA/TEI files
will be identified by the MIME type <span
style="font-size:12.8px">text/tei+xml;</span><span
style="font-size:12.8px">format-variant=tei-dta. </span><span
style="font-size:12.8px">A parameter "token=0/1" can be
added to indicate whether (=1) or not (=0) the respective
TEI file is tokenized (i.e. has <w> markup).</span></div>
</div>
<div><span style="font-size:12.8px">(2b) Bryan Jurish will adapt
the respective web services at BBAW accordingly</span><br>
</div>
<div><br>
</div>
<div>(3a) In WebLicht (in CLARIN in general?), EXMARaLDA Basic
Transcriptions will be identified by the MIME type <span
style="font-size:12.8px">text/xml;
format-variant=exmaralda-exb</span><span
style="font-size:12.8px"><br>
</span></div>
<div>
<div>(3b) In WebLicht (in CLARIN in general?),
FOLKER/OrthoNormal transcription files will be identified by
the MIME type <span style="font-size:12.8px">text/xml;
format-variant=folker-fln</span><span
style="font-size:12.8px"><br>
</span></div>
</div>
<div>
<div>(3c) In WebLicht (in CLARIN in general?), Transcriber
transcription files will be identified by the MIME type <span
style="font-size:12.8px">text/xml;
format-variant=transcriber-trs</span><span
style="font-size:12.8px"><br>
</span></div>
</div>
<div><span style="font-size:12.8px">(3d) </span>HZSK and myself
will <span style="font-size:12.8px">adapt the respective web
services accordingly</span></div>
<div><span style="font-size:12.8px"><br>
</span></div>
<div><span style="font-size:12.8px">(4a) It would have to be
checked (note the passive, I don't know who could be in
charge of this) whether competing MIME types for these file
types are already registered somewhere. I know that WebLicht
already seems to have two variants of EXMARaLDA
transcriptions. The mechanims specifying those would
probably have to be deprecated. Transcriber is also not
unlikely to have been given some kind of mimetype elsewhere
in CLARIN.</span></div>
<div><span style="font-size:12.8px">(4b) Further relevant
formats will be ELAN/EAF, CLAN/CHA and PRAAT/TextGrid (the
latter two being text, not XML formats). All three of them
are also likely to have been registered somewhere already,
so "someone" (again, I wouldn't know who) should check if
mime types have been defined for those.</span></div>
<div><span style="font-size:12.8px"><br>
</span></div>
<div><span style="font-size:12.8px">I guess that this is as good
an answer as we can currently give to address points 1-3 in
Marie Hinrich's list. @Marie: can you confirm that this is
suffient for you, also to address point 4 in your list? In
my understanding, whatever works for WebLicht in this
respect should also be a suitable basis for a larger context
(the SwitchBoard in particular?). </span></div>
<div><span style="font-size:12.8px"><br>
</span></div>
<div><span style="font-size:12.8px">In my eyes, it remains
crucial, however, that such standardization "decisions" are
centrally documented (including the information </span><span
style="font-size:12.8px">Tomaž suggested).</span><span
style="font-size:12.8px"> The CLARIN standards pages as they
are now (<a moz-do-not-send="true"
href="https://www.clarin.eu/content/standard-recommendations">https://www.clarin.eu/content/standard-recommendations</a>
/ <a moz-do-not-send="true"
href="http://clarin.ids-mannheim.de/standards/index.xq">http://clarin.ids-mannheim.de/standards/index.xq</a>
are the ones I know) are, IMHO, incomplete, inconistent and
outdated, and they certainly do not provide accurate
information on the mime types. Any input from the standard
committee on this question would therefore still be much
appreciated.</span></div>
<div><span style="font-size:12.8px"><br>
</span></div>
<div><span style="font-size:12.8px">Best,</span></div>
<div><span style="font-size:12.8px"><br>
</span></div>
<div><span style="font-size:12.8px">Thomas</span></div>
<div><span style="font-size:12.8px"><br>
</span></div>
<div><span style="font-size:12.8px"><br>
</span></div>
<div><span style="font-size:12.8px"><br>
</span></div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Thu, Jun 23, 2016 at 10:56 AM, Marie
Hinrichs <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:marie.hinrichs@uni-tuebingen.de"
target="_blank">marie.hinrichs@uni-tuebingen.de</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div style="word-wrap:break-word">Hi All,
<div><br>
</div>
<div>Thanks to all of you for all the work you’ve done so
far to get TEI processing integrated into WebLicht.</div>
<div><br>
</div>
<div>From WebLicht’s side, there are several places where
some work/coordination needs to happen:</div>
<div><br>
</div>
<div>1. TCF: agree on the textsource.type attribute and
make sure that the encoder services set it properly</div>
<div>2. Agree on type names (i.e. text/tei+xml or
text/x-tei-dta-xml)</div>
<div>3. Make sure the CMDI for encoder and decoder
services reflect outcomes of 1 and 2</div>
<div>4. Add new mappings to WebLicht for TEI.</div>
<div><br>
</div>
<div>Steps 1-3 are being worked out here on the mailing
list and whichever solution/conventions you agree on are
fine with us.</div>
<div><br>
</div>
<div>Step 4 requires some changes to the WebLicht code -
in particular to the component that we call the
“profiler”. When a user uploads a file, the profiler
tries to figure out what it is and if any of the
WebLicht services can process it. The contentType of the
uploaded file, in combination with standard libraries
for file type recognition are used for this. But
sometimes more digging is necessary, as in the case with
tcf - which is recognized as xml, but it needs a closer
look to see if it is tcf. The profiler will have to be
updated in a similar way to recognize TEI, and hopefully
there is even some straightforward way of distinguishing
between the DTA and the spoken variants. Finally,
mappings need to be established between the results of
the profiler and the service input types so that the
right services are offered to the user for selection.</div>
<div><br>
</div>
<div>Also note that WebLicht chains can be called from the
command-line or programmatically using WebLicht as a
Service (WaaS) - see instructions here: <a
moz-do-not-send="true"
href="https://weblicht.sfs.uni-tuebingen.de/WaaS/"
target="_blank"><a class="moz-txt-link-freetext" href="https://weblicht.sfs.uni-tuebingen.de/WaaS/">https://weblicht.sfs.uni-tuebingen.de/WaaS/</a></a> This
is useful for larger inputs and avoids timeout issues
that arise when using the web interface.</div>
<div><br>
</div>
<div>Best Regards,</div>
<div>Marie</div>
<div>
<div class="h5">
<div><br>
</div>
<div><br>
</div>
<div>
<blockquote type="cite">
<div>On 21.06.2016, at 14:28, Tomaž Erjavec <<a
moz-do-not-send="true"
href="mailto:Tomaz.Erjavec@ijs.si"
target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:Tomaz.Erjavec@ijs.si">Tomaz.Erjavec@ijs.si</a></a>>
wrote:</div>
<br>
<div>
<div bgcolor="#FFFFFF" text="#000000">
<p>Hi,</p>
<p>as regards <br>
</p>
<p>> these format-related specifications
(in this case: the name and possible<br>
> values of attributes which are used in
addition to a mime type) would<br>
> need to be documented and made known at
a central place. <br>
</p>
I'd say the documentation for each would need
to be accompanied by its TEI schema, i.e. the
TEI ODD file and the derived (probably)
RelaxNG schema. Then it would be a simple
matter to check if a document conforms to the
mime type.<br>
<br>
Best,<br>
Tomaž<br>
<br>
<div>Bryan Jurish je 21/06/2016 ob
14:22 napisal:<br>
</div>
<blockquote type="cite">
<div dir="ltr">morning all,
<div><br>
</div>
<div>sounds good to me.</div>
<div><br>
</div>
<div>@Marie: can you give an estimation of
how well this might work for WebLicht?</div>
<div><br>
</div>
<div>I'll add the "format-variant=tei-dta"
parameter to the DTA TEI<->TCF web
service in the next few days, so we can
see how that at least works out.</div>
<div><br>
</div>
<div>marmosets,</div>
<div> Bryan</div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Tue, Jun 21,
2016 at 12:32 PM, Thomas Schmidt <span
dir="ltr"><<a
moz-do-not-send="true"
href="mailto:thomas.schmidt@ids-mannheim.de"
target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:thomas.schmidt@ids-mannheim.de">thomas.schmidt@ids-mannheim.de</a></a>></span>
wrote:<br>
<blockquote class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex">Dear all,<br>
<br>
revising my suggestions from the
teiweblicht list according to Bryan's<br>
proposal to use official mime-types
plus parameters (instead of<br>
x-extended custom mime types) would
mean that:<br>
<br>
"text/x-tei-isospoken+xml" could
become "text/tei+xml;<br>
format-variant=tei-iso-spoken" (+
tokenized=0/1)<br>
"text/x-tei-dta+xml" could become
"text/tei+xml;<br>
format-variant=tei-dta" (+
tokenized=0/1)<br>
"text/x-exmaralda-exb+xml" could
become "text/xml;
format-variant=exmaralda-exb"<br>
... and so forth (for other TEI oder
XML based formats)<br>
<br>
Wouldn't that be a solomonic solution?
What do the WebLicht developers<br>
say? And independently of that, I
think that Hanna is right that these<br>
format-related specifications (in this
case: the name and possible<br>
values of attributes which are used in
addition to a mime type) would<br>
need to be documented and made known
at a central place. I guess it<br>
would be up to the standards committee
to decide on that?<br>
<br>
Best regards,<br>
<br>
Thomas<br>
<div>
<div><br>
<br>
<br>
<br>
<br>
On Sat, Jun 18, 2016 at 10:56 AM,
Bryan Jurish <<a
moz-do-not-send="true"
href="mailto:jurish@bbaw.de"
target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:jurish@bbaw.de">jurish@bbaw.de</a></a>>
wrote:<br>
> moin all,<br>
><br>
> fwiw, I agree with Dieter
that we need to differentiate
between "proper"<br>
> MIME types (i.e. standardized
conventions registered with IANA)
and<br>
> CLARIN-internal (rsp.
WebLicht-internal) conventions.
We have been using<br>
> MIME types as the basis of
the WebLicht textSource/@type
attribute,<br>
> analogous to the HTTP
"ContentType" header, cf.<br>
> <a moz-do-not-send="true"
href="https://tools.ietf.org/html/rfc2045#section-5.1"
rel="noreferrer" target="_blank">https://tools.ietf.org/html/rfc2045#section-5.1</a>
. At the risk of repeating<br>
> what I've already said on the
tei-weblicht list, use of the
ContentType<br>
> syntax allows us to have our
cake and eat it too: we can go
ahead and use<br>
> "official" IANA-sanctioned
"true" MIME types and specify
variants<br>
> ("dialects", "flavors") using
parameters. The DTA
TEI<->TCF converter is<br>
> already doing this, setting
textSource/@type to either
"text/tei+xml;<br>
> tokenized=0" or
"text/tei+xml; tokenized=1",
depending on the relevant<br>
> properties of the input
document.<br>
><br>
> just my €0.02.<br>
><br>
> marmosets,<br>
> Bryan<br>
><br>
><br>
> On Fri, Jun 17, 2016 at 1:43
PM, Dieter Van Uytvanck <<a
moz-do-not-send="true"
href="mailto:dieter@clarin.eu"
target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:dieter@clarin.eu">dieter@clarin.eu</a></a>><br>
> wrote:<br>
>><br>
>> On 17/06/16 12:59, Sander
Maijers wrote:<br>
>> > After all, you would
want a<br>
>> > resource's metadata
to be completely descriptive of
such elementary<br>
>> > aspects as internal
structure and content of the TEI
files, and not<br>
>> > dependent on system
configuration (served as custom
media type x or y,<br>
>> > as long as the
server remains so configured).<br>
>><br>
>> Hi Sander,<br>
>><br>
>> Thank you for sharing
your opinion.<br>
>><br>
>> One side note: we are
talking about detecting the
mimetype as indicated<br>
>> in the CMDI ResourceProxy
attribute, see:<br>
>><br>
>><br>
>> <a
moz-do-not-send="true"
href="https://www.clarin.eu/faq/how-can-i-specify-additional-details-about-resourceproxy"
rel="noreferrer" target="_blank"><a class="moz-txt-link-freetext" href="https://www.clarin.eu/faq/how-can-i-specify-additional-details-about-resourceproxy">https://www.clarin.eu/faq/how-can-i-specify-additional-details-about-resourceproxy</a></a><br>
>><br>
>> So for the scenario VLO
-> LR switchboard ->
processing application<br>
>><br>
>> the system configuration
would not be relevant, since the
mimetype is<br>
>> explicitly mentioned in
the metadata. The key is to find
agreement about<br>
>> a simple and light-weight
way of designating the variants of
TEI.<br>
>><br>
>> best,<br>
>><br>
>> --<br>
>> Dieter Van Uytvanck<br>
>> Technical Director CLARIN
ERIC<br>
>> <a
moz-do-not-send="true"
href="http://www.clarin.eu/"
rel="noreferrer" target="_blank"><a class="moz-txt-link-abbreviated" href="http://www.clarin.eu">www.clarin.eu</a></a>
| tel. <a moz-do-not-send="true"
href="tel:%2B31-%280%29850091363" value="+31850091363" target="_blank">+31-(0)850091363</a>
| skype: dietervu.mpi<br>
>>
_______________________________________________<br>
>> Teiweblicht mailing list<br>
>> <a
moz-do-not-send="true"
href="mailto:Teiweblicht@lists.informatik.uni-leipzig.de"
target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:Teiweblicht@lists.informatik.uni-leipzig.de">Teiweblicht@lists.informatik.uni-leipzig.de</a></a><br>
>> <a
moz-do-not-send="true"
href="http://lists.informatik.uni-leipzig.de/mailman/listinfo/teiweblicht"
rel="noreferrer" target="_blank"><a class="moz-txt-link-freetext" href="http://lists.informatik.uni-leipzig.de/mailman/listinfo/teiweblicht">http://lists.informatik.uni-leipzig.de/mailman/listinfo/teiweblicht</a></a><br>
>><br>
><br>
><br>
><br>
> --<br>
>
***************************************************<br>
> Bryan Jurish<br>
> Deutsches Textarchiv<br>
> Digitales Wörterbuch der
deutschen Sprache<br>
> Berlin-Brandenburgische
Akademie der Wissenschaften<br>
><br>
> Jägerstr. 22/23<br>
> 10117 Berlin<br>
><br>
> Tel.: <a
moz-do-not-send="true"
href="tel:%2B49%20%280%2930%2020370%20539"
value="+493020370539"
target="_blank">+49 (0)30 20370
539</a><br>
> E-Mail: <a
moz-do-not-send="true"
href="mailto:jurish@bbaw.de"
target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:jurish@bbaw.de">jurish@bbaw.de</a></a><br>
>
***************************************************<br>
><br>
>
_______________________________________________<br>
> Teiweblicht mailing list<br>
> <a moz-do-not-send="true"
href="mailto:Teiweblicht@lists.informatik.uni-leipzig.de"
target="_blank">Teiweblicht@lists.informatik.uni-leipzig.de</a><br>
> <a moz-do-not-send="true"
href="http://lists.informatik.uni-leipzig.de/mailman/listinfo/teiweblicht"
rel="noreferrer" target="_blank">http://lists.informatik.uni-leipzig.de/mailman/listinfo/teiweblicht</a><br>
><br>
<br>
<br>
<br>
--<br>
</div>
</div>
<div>
<div>Thomas Schmidt<br>
IDS Mannheim<br>
R5, 6-13<br>
D-68161 Mannheim<br>
Tel.: <a moz-do-not-send="true"
href="tel:%2B49%20%28621%29%201581-313"
value="+496211581313"
target="_blank">+49 (621)
1581-313</a><br>
<a moz-do-not-send="true"
href="http://agd.ids-mannheim.de/index.shtml"
rel="noreferrer" target="_blank">http://agd.ids-mannheim.de/index.shtml</a><br>
<a moz-do-not-send="true"
href="http://www.exmaralda.org/"
rel="noreferrer" target="_blank">http://www.exmaralda.org</a><br>
<br>
</div>
</div>
</blockquote>
</div>
<br>
<br clear="all">
<div><br>
</div>
-- <br>
<div data-smartmail="gmail_signature">
<div dir="ltr">***************************************************<br>
Bryan Jurish<br>
Deutsches Textarchiv
<div>Digitales Wörterbuch der
deutschen Sprache
<div>
<div>Berlin-Brandenburgische
Akademie der Wissenschaften<br>
<br>
Jägerstr. 22/23<br>
10117 Berlin<br>
<br>
Tel.: <a
moz-do-not-send="true"
href="tel:%2B49%20%280%2930%2020370%20539"
value="+493020370539"
target="_blank">+49 (0)30
20370 539</a><br>
E-Mail: <a
moz-do-not-send="true"
href="mailto:jurish@bbaw.de"
target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:jurish@bbaw.de">jurish@bbaw.de</a></a><br>
***************************************************</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
<br>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
<br clear="all">
<div><br>
</div>
-- <br>
<div class="gmail_signature" data-smartmail="gmail_signature">
<div dir="ltr">
<div>Thomas Schmidt<br>
IDS Mannheim<br>
R5, 6-13<br>
D-68161 Mannheim<br>
Tel.: +49 (621) 1581-313<br>
<a moz-do-not-send="true"
href="http://agd.ids-mannheim.de/index.shtml"
target="_blank">http://agd.ids-mannheim.de/index.shtml</a><br>
<a moz-do-not-send="true" href="http://www.exmaralda.org"
target="_blank">http://www.exmaralda.org</a></div>
</div>
</div>
</div>
</blockquote>
<br>
<br>
<pre class="moz-signature" cols="72">--
Piotr Bański, Ph.D.
Senior Researcher,
Institut für Deutsche Sprache,
R5 6-13
68-161 Mannheim, Germany</pre>
</body>
</html>