<div dir="ltr">Dear all,<div><br></div><div>thanks, Dieter and Piotr for pointing out the application vs. text issue. It seems clear that "application/" is the preferrable variant, so we'll change that in our web services accordingly (might take a while due to holidays). </div><div><br></div><div><span style="font-size:12.8px">I think I'd like to carefully disagree regarding "existing" mime types:</span></div><div><span style="font-size:12.8px"><br></span></div><div><span style="font-size:12.8px">> We risk here to "invent" </span><span style="font-size:12.8px">a new standard where some practice is already used in the wild.</span><br></div><div><span style="font-size:12.8px"><br></span></div><div><span style="font-size:12.8px">That "some" practice is used "in the wild" is, in my eyes, a strong reason for, rather than against, attempting to agree on a standard. As far as I can see, the use of mime types in CLARIN so far has been largely uncoordinated. If we could confirm that "</span><span style="font-size:12.8px">text/exb+xml"and "text/tcf" are indeed the only mime types currently in use for EXMARaLDA and TCF files respectively, *and* if they were in use in more than one place, this might be a reason for accepting them as a de facto standard-like-practice. I doubt that we can and that they are, though. And if we find the need to agree on something and to change some existing data accordingly, that something might as well follow the same logic as the TEI format variant mime types.</span></div><div><span style="font-size:12.8px"><br></span></div><div><span style="font-size:12.8px">I don't have any strong passions regarding the "tokenized" parameter issue. For the TEI/ISO transcriptions, we could live with both approaches. I am a bit worried, though, that further such distinctions ("normalied"? / "lemmatized"? / "tagged"?) might lead to an uncontrollable proliferation of mime-type-format variants, where the aim of the TEI standardisation was actually to reduce variation.</span></div><div><span style="font-size:12.8px"><br></span></div><div><span style="font-size:12.8px">Best regards,</span></div><div><span style="font-size:12.8px"><br></span></div><div><span style="font-size:12.8px">Thomas</span></div><div><span style="font-size:12.8px"><br></span></div><div><span style="font-size:12.8px"><br></span></div><div><span style="font-size:12.8px"><br></span></div><div><span style="font-size:12.8px"><br></span></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Jul 11, 2016 at 7:10 PM, Piotr Bański <span dir="ltr"><<a href="mailto:banski@ids-mannheim.de" target="_blank">banski@ids-mannheim.de</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Dear Dieter,<br>
<br>
Thank you so much for this catch. Indeed, on the one hand it's RFC 6129, and on the other, it's the "Architecture..." spec (in the previously quoted fragment [1]), that call for application/ in all cases. I'll modify this now in the wiki.<br>
<br>
[1]: <a href="https://www.w3.org/TR/webarch/#xml-media-types" rel="noreferrer" target="_blank">https://www.w3.org/TR/webarch/#xml-media-types</a><br>
<br>
Best regards,<br>
<br>
P.<div class="HOEnZb"><div class="h5"><br>
<br>
On 11/07/16 17:58, Dieter Van Uytvanck wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
On 08/07/16 17:42, Piotr Bański wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
I have summarized Thomas's proposal at<br>
<a href="https://trac.clarin.eu/wiki/MIME%20format%20variants" rel="noreferrer" target="_blank">https://trac.clarin.eu/wiki/MIME%20format%20variants</a><br>
</blockquote>
Thank you Piotr!<br>
<br>
Great to see this discussion moving forward. Before I start editing the<br>
wiki, let me first check and mention a few points:<br>
<br>
- right now it states "text/tei+xml" as mimetype for TEI; shouldn't that<br>
be "application/tei+xml" ?<br>
<br>
- we have a CMDI component where we have gathered (at least a subset of)<br>
CLARIN-relevant mime types:<br>
<br>
<a href="https://catalog.clarin.eu/ds/ComponentRegistry#/?itemId=clarin.eu%3Acr1%3Ac_1271859438106®istrySpace=public" rel="noreferrer" target="_blank">https://catalog.clarin.eu/ds/ComponentRegistry#/?itemId=clarin.eu%3Acr1%3Ac_1271859438106®istrySpace=public</a><br>
<br>
It is not complete, but I will add it as a starting point to the trac<br>
(might take me a few days with the DH conference coming up)<br>
<br>
- I agree that the "format-variant=" is a necessary and elegant solution<br>
in case of e.g. the mimetype "application/tei+xml".<br>
<br>
- I am not sure about using this approach for general XML-based formats<br>
where no disambiguation on top of the mimetype is strictly necessary,<br>
since some form of mimeytype is already in use, (eg "text/tcf+xml", see<br>
<a href="https://vlo.clarin.eu/?q=text/tcf" rel="noreferrer" target="_blank">https://vlo.clarin.eu/?q=text/tcf</a> or "text/exb+xml", see<br>
<a href="https://vlo.clarin.eu/search?q=text/exb%2Bxml" rel="noreferrer" target="_blank">https://vlo.clarin.eu/search?q=text/exb%2Bxml</a>). We risk here to "invent"<br>
a new standard where some practice is already used in the wild.<br>
<br>
- Optional parameter(s) like "tokenized=0/1" were seen as problematic<br>
when discussing these with some of our developers - they can lead to<br>
arbitrary and unpredictable combinations. Maybe we can use something<br>
like "application/tei+xml;format-variant=dta-tokenized" instead? A major<br>
advantage of a finite list of format variants is that we can document<br>
every variant, eg with a link to an example file.<br>
<br>
best regards,<br>
</blockquote>
<br>
<br>
-- <br></div></div><div class="HOEnZb"><div class="h5">
Piotr Bański, Ph.D.<br>
Senior Researcher,<br>
Institut für Deutsche Sprache,<br>
R5 6-13<br>
68-161 Mannheim, Germany<br>
<br>
</div></div></blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div>Thomas Schmidt<br>IDS Mannheim<br>R5, 6-13<br>D-68161 Mannheim<br>Tel.: +49 (621) 1581-313<br><a href="http://agd.ids-mannheim.de/index.shtml" target="_blank">http://agd.ids-mannheim.de/index.shtml</a><br><a href="http://www.exmaralda.org" target="_blank">http://www.exmaralda.org</a></div></div></div>
</div>