[Dev] [Teiweblicht] proposal: using a common mime type for TEI files

Bryan Jurish jurish at bbaw.de
Sat Jun 18 10:56:41 CEST 2016


moin all,

fwiw, I agree with Dieter that we need to differentiate between "proper"
MIME types (i.e. standardized conventions registered with IANA) and
CLARIN-internal (rsp. WebLicht-internal) conventions.  We have been using
MIME types as the basis of the WebLicht textSource/@type attribute,
analogous to the HTTP "ContentType" header, cf.
https://tools.ietf.org/html/rfc2045#section-5.1 .  At the risk of repeating
what I've already said on the tei-weblicht list, use of the ContentType
syntax allows us to have our cake and eat it too: we can go ahead and use
"official" IANA-sanctioned "true" MIME types and specify variants
("dialects", "flavors") using parameters.  The DTA TEI<->TCF converter is
already doing this, setting textSource/@type to either "text/tei+xml;
tokenized=0" or "text/tei+xml; tokenized=1", depending on the relevant
properties of the input document.

just my €0.02.

marmosets,
  Bryan


On Fri, Jun 17, 2016 at 1:43 PM, Dieter Van Uytvanck <dieter at clarin.eu>
wrote:

> On 17/06/16 12:59, Sander Maijers wrote:
> > After all, you would want a
> > resource's metadata to be completely descriptive of such elementary
> > aspects as internal structure and content of the TEI files, and not
> > dependent on system configuration (served as custom media type x or y,
> > as long as the server remains so configured).
>
> Hi Sander,
>
> Thank you for sharing your opinion.
>
> One side note: we are talking about detecting the mimetype as indicated
> in the CMDI ResourceProxy attribute, see:
>
>
> https://www.clarin.eu/faq/how-can-i-specify-additional-details-about-resourceproxy
>
> So for the scenario VLO -> LR switchboard -> processing application
>
> the system configuration would not be relevant, since the mimetype is
> explicitly mentioned in the metadata. The key is to find agreement about
> a simple and light-weight way of designating the variants of TEI.
>
> best,
>
> --
> Dieter Van Uytvanck
> Technical Director CLARIN ERIC
> www.clarin.eu | tel. +31-(0)850091363 | skype: dietervu.mpi
> _______________________________________________
> Teiweblicht mailing list
> Teiweblicht at lists.informatik.uni-leipzig.de
> http://lists.informatik.uni-leipzig.de/mailman/listinfo/teiweblicht
>
>


-- 
***************************************************
Bryan Jurish
Deutsches Textarchiv
Digitales Wörterbuch der deutschen Sprache
Berlin-Brandenburgische Akademie der Wissenschaften

Jägerstr. 22/23
10117 Berlin

Tel.:     +49 (0)30 20370 539
E-Mail:   jurish at bbaw.de
***************************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clarin.eu/pipermail/dev/attachments/20160618/1b5dc93b/attachment.html>


More information about the Dev mailing list