[Standards] [Dev] [Teiweblicht] proposal: using a common mime type for TEI files

Tomaž Erjavec Tomaz.Erjavec at ijs.si
Tue Jun 21 14:28:33 CEST 2016


Hi,

as regards

 > these format-related specifications (in this case: the name and possible
 > values of attributes which are used in addition to a mime type) would
 > need to be documented and made known at a central place.

I'd say the documentation for each would need to be accompanied by its 
TEI schema, i.e. the TEI ODD file and the derived (probably) RelaxNG 
schema. Then it would be a simple matter to check if a document conforms 
to the mime type.

Best,
Tomaž

Bryan Jurish je 21/06/2016 ob 14:22 napisal:
> morning all,
>
> sounds good to me.
>
> @Marie: can you give an estimation of how well this might work for 
> WebLicht?
>
> I'll add the "format-variant=tei-dta" parameter to the DTA TEI<->TCF 
> web service in the next few days, so we can see how that at least 
> works out.
>
> marmosets,
>   Bryan
>
> On Tue, Jun 21, 2016 at 12:32 PM, Thomas Schmidt 
> <thomas.schmidt at ids-mannheim.de 
> <mailto:thomas.schmidt at ids-mannheim.de>> wrote:
>
>     Dear all,
>
>     revising my suggestions from the teiweblicht list according to Bryan's
>     proposal to use official mime-types plus parameters (instead of
>     x-extended custom mime types) would mean that:
>
>     "text/x-tei-isospoken+xml" could become "text/tei+xml;
>     format-variant=tei-iso-spoken" (+ tokenized=0/1)
>     "text/x-tei-dta+xml" could become "text/tei+xml;
>     format-variant=tei-dta" (+ tokenized=0/1)
>     "text/x-exmaralda-exb+xml" could become "text/xml;
>     format-variant=exmaralda-exb"
>     ... and so forth (for other TEI oder XML based formats)
>
>     Wouldn't that be a solomonic solution? What do the WebLicht developers
>     say? And independently of that, I think that Hanna is right that these
>     format-related specifications (in this case: the name and possible
>     values of attributes which are used in addition to a mime type) would
>     need to be documented and made known at a central place. I guess it
>     would be up to the standards committee to decide on that?
>
>     Best regards,
>
>     Thomas
>
>
>
>
>
>     On Sat, Jun 18, 2016 at 10:56 AM, Bryan Jurish <jurish at bbaw.de
>     <mailto:jurish at bbaw.de>> wrote:
>     > moin all,
>     >
>     > fwiw, I agree with Dieter that we need to differentiate between
>     "proper"
>     > MIME types (i.e. standardized conventions registered with IANA) and
>     > CLARIN-internal (rsp. WebLicht-internal) conventions.  We have
>     been using
>     > MIME types as the basis of the WebLicht textSource/@type attribute,
>     > analogous to the HTTP "ContentType" header, cf.
>     > https://tools.ietf.org/html/rfc2045#section-5.1 .  At the risk
>     of repeating
>     > what I've already said on the tei-weblicht list, use of the
>     ContentType
>     > syntax allows us to have our cake and eat it too: we can go
>     ahead and use
>     > "official" IANA-sanctioned "true" MIME types and specify variants
>     > ("dialects", "flavors") using parameters.  The DTA TEI<->TCF
>     converter is
>     > already doing this, setting textSource/@type to either
>     "text/tei+xml;
>     > tokenized=0" or "text/tei+xml; tokenized=1", depending on the
>     relevant
>     > properties of the input document.
>     >
>     > just my €0.02.
>     >
>     > marmosets,
>     >   Bryan
>     >
>     >
>     > On Fri, Jun 17, 2016 at 1:43 PM, Dieter Van Uytvanck
>     <dieter at clarin.eu <mailto:dieter at clarin.eu>>
>     > wrote:
>     >>
>     >> On 17/06/16 12:59, Sander Maijers wrote:
>     >> > After all, you would want a
>     >> > resource's metadata to be completely descriptive of such
>     elementary
>     >> > aspects as internal structure and content of the TEI files,
>     and not
>     >> > dependent on system configuration (served as custom media
>     type x or y,
>     >> > as long as the server remains so configured).
>     >>
>     >> Hi Sander,
>     >>
>     >> Thank you for sharing your opinion.
>     >>
>     >> One side note: we are talking about detecting the mimetype as
>     indicated
>     >> in the CMDI ResourceProxy attribute, see:
>     >>
>     >>
>     >>
>     https://www.clarin.eu/faq/how-can-i-specify-additional-details-about-resourceproxy
>     >>
>     >> So for the scenario VLO -> LR switchboard -> processing application
>     >>
>     >> the system configuration would not be relevant, since the
>     mimetype is
>     >> explicitly mentioned in the metadata. The key is to find
>     agreement about
>     >> a simple and light-weight way of designating the variants of TEI.
>     >>
>     >> best,
>     >>
>     >> --
>     >> Dieter Van Uytvanck
>     >> Technical Director CLARIN ERIC
>     >> www.clarin.eu <http://www.clarin.eu> | tel. +31-(0)850091363
>     <tel:%2B31-%280%29850091363> | skype: dietervu.mpi
>     >> _______________________________________________
>     >> Teiweblicht mailing list
>     >> Teiweblicht at lists.informatik.uni-leipzig.de
>     <mailto:Teiweblicht at lists.informatik.uni-leipzig.de>
>     >> http://lists.informatik.uni-leipzig.de/mailman/listinfo/teiweblicht
>     >>
>     >
>     >
>     >
>     > --
>     > ***************************************************
>     > Bryan Jurish
>     > Deutsches Textarchiv
>     > Digitales Wörterbuch der deutschen Sprache
>     > Berlin-Brandenburgische Akademie der Wissenschaften
>     >
>     > Jägerstr. 22/23
>     > 10117 Berlin
>     >
>     > Tel.: +49 (0)30 20370 539 <tel:%2B49%20%280%2930%2020370%20539>
>     > E-Mail: jurish at bbaw.de <mailto:jurish at bbaw.de>
>     > ***************************************************
>     >
>     > _______________________________________________
>     > Teiweblicht mailing list
>     > Teiweblicht at lists.informatik.uni-leipzig.de
>     <mailto:Teiweblicht at lists.informatik.uni-leipzig.de>
>     > http://lists.informatik.uni-leipzig.de/mailman/listinfo/teiweblicht
>     >
>
>
>
>     --
>     Thomas Schmidt
>     IDS Mannheim
>     R5, 6-13
>     D-68161 Mannheim
>     Tel.: +49 (621) 1581-313 <tel:%2B49%20%28621%29%201581-313>
>     http://agd.ids-mannheim.de/index.shtml
>     http://www.exmaralda.org
>
>
>
>
> -- 
> ***************************************************
> Bryan Jurish
> Deutsches Textarchiv
> Digitales Wörterbuch der deutschen Sprache
> Berlin-Brandenburgische Akademie der Wissenschaften
>
> Jägerstr. 22/23
> 10117 Berlin
>
> Tel.:     +49 (0)30 20370 539
> E-Mail: jurish at bbaw.de <mailto:jurish at bbaw.de>
> ***************************************************

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clarin.eu/pipermail/standards/attachments/20160621/b1728519/attachment.html>


More information about the Standards mailing list