[Standards] [Teiweblicht] [Dev] proposal: using a common mime type for TEI files

Marie Hinrichs marie.hinrichs at uni-tuebingen.de
Fri Jul 15 18:25:18 CEST 2016


Hi All,

As far as WebLicht is concerned, the name for the input/output type feature is just that, a name. So when people want to introduce a new type it is left up to them how they want to call it, and I think the names suggested are good.  

However, changing the tcf type feature name may cause some serious backwards compatibility issues, especially when using WaaS (WebLicht as a Service). Chains created with the “old” name probably will not work anymore.

So, regarding the new names WebLicht is fine with whatever the webservice developers decide on. But with regard to renaming the tcf type feature I think it may be problematic and I tend to think that it’s probably not worth it.

Best Regards,
Marie


> On 12.07.2016, at 10:14, Thomas Schmidt <thomas.schmidt at ids-mannheim.de> wrote:
> 
> Dear all,
> 
> thanks, Dieter and Piotr for pointing out the application vs. text issue. It seems clear that "application/" is the preferrable variant, so we'll change that in our web services accordingly (might take a while due to holidays). 
> 
> I think I'd like to carefully disagree regarding "existing" mime types:
> 
> > We risk here to "invent" a new standard where some practice is already used in the wild.
> 
> That "some" practice is used "in the wild" is, in my eyes, a strong reason for, rather than against, attempting to agree on a standard. As far as I can see, the use of mime types in CLARIN so far has been largely uncoordinated. If we could confirm that "text/exb+xml"and "text/tcf" are indeed the only mime types currently in use for EXMARaLDA and TCF files respectively, *and* if they were in use in more than one place, this might be a reason for accepting them as a de facto standard-like-practice. I doubt that we can and that they are, though. And if we find the need to agree on something and to change some existing data accordingly, that something might as well follow the same logic as the TEI format variant mime types.
> 
> I don't have any strong passions regarding the "tokenized" parameter issue. For the TEI/ISO transcriptions, we could live with both approaches. I am a bit worried, though, that further such distinctions ("normalied"? / "lemmatized"? / "tagged"?) might lead to an uncontrollable proliferation of mime-type-format variants, where the aim of the TEI standardisation was actually to reduce variation.
> 
> Best regards,
> 
> Thomas
> 
> 
> 
> 
> 
> On Mon, Jul 11, 2016 at 7:10 PM, Piotr Bański <banski at ids-mannheim.de <mailto:banski at ids-mannheim.de>> wrote:
> Dear Dieter,
> 
> Thank you so much for this catch. Indeed, on the one hand it's RFC 6129, and on the other, it's the "Architecture..." spec (in the previously quoted fragment [1]), that call for application/ in all cases. I'll modify this now in the wiki.
> 
> [1]: https://www.w3.org/TR/webarch/#xml-media-types <https://www.w3.org/TR/webarch/#xml-media-types>
> 
> Best regards,
> 
>   P.
> 
> 
> On 11/07/16 17:58, Dieter Van Uytvanck wrote:
> On 08/07/16 17:42, Piotr Bański wrote:
> I have summarized Thomas's proposal at
> https://trac.clarin.eu/wiki/MIME%20format%20variants <https://trac.clarin.eu/wiki/MIME%20format%20variants>
> Thank you Piotr!
> 
> Great to see this discussion moving forward. Before I start editing the
> wiki, let me first check and mention a few points:
> 
> - right now it states "text/tei+xml" as mimetype for TEI; shouldn't that
> be "application/tei+xml" ?
> 
> - we have a CMDI component where we have gathered (at least a subset of)
> CLARIN-relevant mime types:
> 
> https://catalog.clarin.eu/ds/ComponentRegistry#/?itemId=clarin.eu%3Acr1%3Ac_1271859438106&registrySpace=public <https://catalog.clarin.eu/ds/ComponentRegistry#/?itemId=clarin.eu%3Acr1%3Ac_1271859438106&registrySpace=public>
> 
> It is not complete, but I will add it as a starting point to the trac
> (might take me a few days with the DH conference coming up)
> 
> - I agree that the "format-variant=" is a necessary and elegant solution
> in case of e.g. the mimetype "application/tei+xml".
> 
> - I am not sure about using this approach for general XML-based formats
> where no disambiguation on top of the mimetype is strictly necessary,
> since some form of mimeytype is already in use, (eg "text/tcf+xml", see
> https://vlo.clarin.eu/?q=text/tcf <https://vlo.clarin.eu/?q=text/tcf> or "text/exb+xml", see
> https://vlo.clarin.eu/search?q=text/exb%2Bxml <https://vlo.clarin.eu/search?q=text/exb%2Bxml>). We risk here to "invent"
> a new standard where some practice is already used in the wild.
> 
> - Optional parameter(s) like "tokenized=0/1" were seen as problematic
> when discussing these with some of our developers - they can lead to
> arbitrary and unpredictable combinations. Maybe we can use something
> like "application/tei+xml;format-variant=dta-tokenized" instead? A major
> advantage of a finite list of format variants is that we can document
> every variant, eg with a link to an example file.
> 
> best regards,
> 
> 
> -- 
> Piotr Bański, Ph.D.
> Senior Researcher,
> Institut für Deutsche Sprache,
> R5 6-13
> 68-161 Mannheim, Germany
> 
> 
> 
> 
> -- 
> Thomas Schmidt
> IDS Mannheim
> R5, 6-13
> D-68161 Mannheim
> Tel.: +49 (621) 1581-313
> http://agd.ids-mannheim.de/index.shtml <http://agd.ids-mannheim.de/index.shtml>
> http://www.exmaralda.org <http://www.exmaralda.org/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clarin.eu/pipermail/standards/attachments/20160715/18869a09/attachment.html>


More information about the Standards mailing list