[Dev] [Teiweblicht] [Standards] proposal: using a common mime type for TEI files

Dieter Van Uytvanck dieter at clarin.eu
Mon Jul 11 17:58:15 CEST 2016


On 08/07/16 17:42, Piotr Bański wrote:
> I have summarized Thomas's proposal at
> https://trac.clarin.eu/wiki/MIME%20format%20variants

Thank you Piotr!

Great to see this discussion moving forward. Before I start editing the
wiki, let me first check and mention a few points:

- right now it states "text/tei+xml" as mimetype for TEI; shouldn't that
be "application/tei+xml" ?

- we have a CMDI component where we have gathered (at least a subset of)
CLARIN-relevant mime types:

https://catalog.clarin.eu/ds/ComponentRegistry#/?itemId=clarin.eu%3Acr1%3Ac_1271859438106&registrySpace=public

It is not complete, but I will add it as a starting point to the trac
(might take me a few days with the DH conference coming up)

- I agree that the "format-variant=" is a necessary and elegant solution
in case of e.g. the mimetype "application/tei+xml".

- I am not sure about using this approach for general XML-based formats
where no disambiguation on top of the mimetype is strictly necessary,
since some form of mimeytype is already in use, (eg "text/tcf+xml", see
https://vlo.clarin.eu/?q=text/tcf or "text/exb+xml", see
https://vlo.clarin.eu/search?q=text/exb%2Bxml). We risk here to "invent"
a new standard where some practice is already used in the wild.

- Optional parameter(s) like "tokenized=0/1" were seen as problematic
when discussing these with some of our developers - they can lead to
arbitrary and unpredictable combinations. Maybe we can use something
like "application/tei+xml;format-variant=dta-tokenized" instead? A major
advantage of a finite list of format variants is that we can document
every variant, eg with a link to an example file.

best regards,
-- 
Dieter Van Uytvanck
Technical Director CLARIN ERIC
www.clarin.eu | tel. +31-(0)850091363 | skype: dietervu.mpi


More information about the Dev mailing list