[Standards] [Teiweblicht] [Dev] proposal: using a common mime type for TEI files

Bryan Jurish jurish at bbaw.de
Tue Jun 21 14:22:24 CEST 2016


morning all,

sounds good to me.

@Marie: can you give an estimation of how well this might work for WebLicht?

I'll add the "format-variant=tei-dta" parameter to the DTA TEI<->TCF web
service in the next few days, so we can see how that at least works out.

marmosets,
  Bryan

On Tue, Jun 21, 2016 at 12:32 PM, Thomas Schmidt <
thomas.schmidt at ids-mannheim.de> wrote:

> Dear all,
>
> revising my suggestions from the teiweblicht list according to Bryan's
> proposal to use official mime-types plus parameters (instead of
> x-extended custom mime types) would mean that:
>
> "text/x-tei-isospoken+xml" could become "text/tei+xml;
> format-variant=tei-iso-spoken" (+ tokenized=0/1)
> "text/x-tei-dta+xml" could become "text/tei+xml;
> format-variant=tei-dta" (+ tokenized=0/1)
> "text/x-exmaralda-exb+xml" could become "text/xml;
> format-variant=exmaralda-exb"
> ... and so forth (for other TEI oder XML based formats)
>
> Wouldn't that be a solomonic solution? What do the WebLicht developers
> say? And independently of that, I think that Hanna is right that these
> format-related specifications (in this case: the name and possible
> values of attributes which are used in addition to a mime type) would
> need to be documented and made known at a central place. I guess it
> would be up to the standards committee to decide on that?
>
> Best regards,
>
> Thomas
>
>
>
>
>
> On Sat, Jun 18, 2016 at 10:56 AM, Bryan Jurish <jurish at bbaw.de> wrote:
> > moin all,
> >
> > fwiw, I agree with Dieter that we need to differentiate between "proper"
> > MIME types (i.e. standardized conventions registered with IANA) and
> > CLARIN-internal (rsp. WebLicht-internal) conventions.  We have been using
> > MIME types as the basis of the WebLicht textSource/@type attribute,
> > analogous to the HTTP "ContentType" header, cf.
> > https://tools.ietf.org/html/rfc2045#section-5.1 .  At the risk of
> repeating
> > what I've already said on the tei-weblicht list, use of the ContentType
> > syntax allows us to have our cake and eat it too: we can go ahead and use
> > "official" IANA-sanctioned "true" MIME types and specify variants
> > ("dialects", "flavors") using parameters.  The DTA TEI<->TCF converter is
> > already doing this, setting textSource/@type to either "text/tei+xml;
> > tokenized=0" or "text/tei+xml; tokenized=1", depending on the relevant
> > properties of the input document.
> >
> > just my €0.02.
> >
> > marmosets,
> >   Bryan
> >
> >
> > On Fri, Jun 17, 2016 at 1:43 PM, Dieter Van Uytvanck <dieter at clarin.eu>
> > wrote:
> >>
> >> On 17/06/16 12:59, Sander Maijers wrote:
> >> > After all, you would want a
> >> > resource's metadata to be completely descriptive of such elementary
> >> > aspects as internal structure and content of the TEI files, and not
> >> > dependent on system configuration (served as custom media type x or y,
> >> > as long as the server remains so configured).
> >>
> >> Hi Sander,
> >>
> >> Thank you for sharing your opinion.
> >>
> >> One side note: we are talking about detecting the mimetype as indicated
> >> in the CMDI ResourceProxy attribute, see:
> >>
> >>
> >>
> https://www.clarin.eu/faq/how-can-i-specify-additional-details-about-resourceproxy
> >>
> >> So for the scenario VLO -> LR switchboard -> processing application
> >>
> >> the system configuration would not be relevant, since the mimetype is
> >> explicitly mentioned in the metadata. The key is to find agreement about
> >> a simple and light-weight way of designating the variants of TEI.
> >>
> >> best,
> >>
> >> --
> >> Dieter Van Uytvanck
> >> Technical Director CLARIN ERIC
> >> www.clarin.eu | tel. +31-(0)850091363 | skype: dietervu.mpi
> >> _______________________________________________
> >> Teiweblicht mailing list
> >> Teiweblicht at lists.informatik.uni-leipzig.de
> >> http://lists.informatik.uni-leipzig.de/mailman/listinfo/teiweblicht
> >>
> >
> >
> >
> > --
> > ***************************************************
> > Bryan Jurish
> > Deutsches Textarchiv
> > Digitales Wörterbuch der deutschen Sprache
> > Berlin-Brandenburgische Akademie der Wissenschaften
> >
> > Jägerstr. 22/23
> > 10117 Berlin
> >
> > Tel.:     +49 (0)30 20370 539
> > E-Mail:   jurish at bbaw.de
> > ***************************************************
> >
> > _______________________________________________
> > Teiweblicht mailing list
> > Teiweblicht at lists.informatik.uni-leipzig.de
> > http://lists.informatik.uni-leipzig.de/mailman/listinfo/teiweblicht
> >
>
>
>
> --
> Thomas Schmidt
> IDS Mannheim
> R5, 6-13
> D-68161 Mannheim
> Tel.: +49 (621) 1581-313
> http://agd.ids-mannheim.de/index.shtml
> http://www.exmaralda.org
>
>


-- 
***************************************************
Bryan Jurish
Deutsches Textarchiv
Digitales Wörterbuch der deutschen Sprache
Berlin-Brandenburgische Akademie der Wissenschaften

Jägerstr. 22/23
10117 Berlin

Tel.:     +49 (0)30 20370 539
E-Mail:   jurish at bbaw.de
***************************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clarin.eu/pipermail/standards/attachments/20160621/271a4ef9/attachment-0001.html>


More information about the Standards mailing list