[Dev] [Teiweblicht] [Standards] proposal: using a common mime type for TEI files

Thomas Schmidt thomas.schmidt at ids-mannheim.de
Tue Jul 12 10:14:35 CEST 2016


Dear all,

thanks, Dieter and Piotr for pointing out the application vs. text issue.
It seems clear that "application/" is the preferrable variant, so we'll
change that in our web services accordingly (might take a while due to
holidays).

I think I'd like to carefully disagree regarding "existing" mime types:

> We risk here to "invent" a new standard where some practice is already
used in the wild.

That "some" practice is used "in the wild" is, in my eyes, a strong reason
for, rather than against, attempting to agree on a standard. As far as I
can see, the use of mime types in CLARIN so far has been largely
uncoordinated. If we could confirm that "text/exb+xml"and "text/tcf" are
indeed the only mime types currently in use for EXMARaLDA and TCF files
respectively, *and* if they were in use in more than one place, this might
be a reason for accepting them as a de facto standard-like-practice. I
doubt that we can and that they are, though. And if we find the need to
agree on something and to change some existing data accordingly, that
something might as well follow the same logic as the TEI format variant
mime types.

I don't have any strong passions regarding the "tokenized" parameter issue.
For the TEI/ISO transcriptions, we could live with both approaches. I am a
bit worried, though, that further such distinctions ("normalied"? /
"lemmatized"? / "tagged"?) might lead to an uncontrollable proliferation of
mime-type-format variants, where the aim of the TEI standardisation was
actually to reduce variation.

Best regards,

Thomas





On Mon, Jul 11, 2016 at 7:10 PM, Piotr Bański <banski at ids-mannheim.de>
wrote:

> Dear Dieter,
>
> Thank you so much for this catch. Indeed, on the one hand it's RFC 6129,
> and on the other, it's the "Architecture..." spec (in the previously quoted
> fragment [1]), that call for application/ in all cases. I'll modify this
> now in the wiki.
>
> [1]: https://www.w3.org/TR/webarch/#xml-media-types
>
> Best regards,
>
>   P.
>
>
> On 11/07/16 17:58, Dieter Van Uytvanck wrote:
>
>> On 08/07/16 17:42, Piotr Bański wrote:
>>
>>> I have summarized Thomas's proposal at
>>> https://trac.clarin.eu/wiki/MIME%20format%20variants
>>>
>> Thank you Piotr!
>>
>> Great to see this discussion moving forward. Before I start editing the
>> wiki, let me first check and mention a few points:
>>
>> - right now it states "text/tei+xml" as mimetype for TEI; shouldn't that
>> be "application/tei+xml" ?
>>
>> - we have a CMDI component where we have gathered (at least a subset of)
>> CLARIN-relevant mime types:
>>
>>
>> https://catalog.clarin.eu/ds/ComponentRegistry#/?itemId=clarin.eu%3Acr1%3Ac_1271859438106&registrySpace=public
>>
>> It is not complete, but I will add it as a starting point to the trac
>> (might take me a few days with the DH conference coming up)
>>
>> - I agree that the "format-variant=" is a necessary and elegant solution
>> in case of e.g. the mimetype "application/tei+xml".
>>
>> - I am not sure about using this approach for general XML-based formats
>> where no disambiguation on top of the mimetype is strictly necessary,
>> since some form of mimeytype is already in use, (eg "text/tcf+xml", see
>> https://vlo.clarin.eu/?q=text/tcf or "text/exb+xml", see
>> https://vlo.clarin.eu/search?q=text/exb%2Bxml). We risk here to "invent"
>> a new standard where some practice is already used in the wild.
>>
>> - Optional parameter(s) like "tokenized=0/1" were seen as problematic
>> when discussing these with some of our developers - they can lead to
>> arbitrary and unpredictable combinations. Maybe we can use something
>> like "application/tei+xml;format-variant=dta-tokenized" instead? A major
>> advantage of a finite list of format variants is that we can document
>> every variant, eg with a link to an example file.
>>
>> best regards,
>>
>
>
> --
> Piotr Bański, Ph.D.
> Senior Researcher,
> Institut für Deutsche Sprache,
> R5 6-13
> 68-161 Mannheim, Germany
>
>


-- 
Thomas Schmidt
IDS Mannheim
R5, 6-13
D-68161 Mannheim
Tel.: +49 (621) 1581-313
http://agd.ids-mannheim.de/index.shtml
http://www.exmaralda.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clarin.eu/pipermail/dev/attachments/20160712/58c213d3/attachment.html>


More information about the Dev mailing list