[Standards] [Dev] proposal: using a common mime type for TEI files

Thomas Schmidt thomas.schmidt at ids-mannheim.de
Thu Jun 16 20:35:25 CEST 2016


Dear all,

yes, we've been discussing this on the teiweblicht mailing list (cc'ed
here) with the aim of making WebLicht usable for TEI data. We're stuck
with more or less the problem that Torsten describes. We have two
TEI-based formats which we would like to consider at the moment. One
is DTA's format for written texts, the other is the recently finalised
ISO standard for transcriptions of spoken language. There is no way we
can treat those as just two different "flavours" of the same format.
Therefore, we would need to distinguish this at whathever the place is
where WebLicht distinguishes file formats. If it is via the mime type,
we would need a mime type extension like "text/x-tei-isospoken+xml"
vs. "text/x-tei-dta+xml". If it is on some other level, we would have
to know which and agree on a suitable set of TEI variant identifiers.
I'm copying relevant parts of the mailing list exchange below for your
information.

Best regards,

Thomas

----------------
On Tue, Apr 26, 2016 at 3:55 PM, Thomas Schmidt
<thomas.schmidt at ids-mannheim.de> wrote:
> [...] it is obvious that no sufficiently specialized processing method (e.g.
> individual WebLicht services) can handle "TEI" as a generic file type.
> There are way too many degrees of freedom in the TEI guidelines, and
> my understanding was that TEI is itself meant as just a framework in
> which more specific data models/file formats can be defined (which
> we've done...).
>
> The two TEI "dialects" we have so far (DTA and ISO/Spoken) should
> therefore be handled as two separate file types, just as any other two
> different inputs (say, RTF vs. OpenOfficeXML) are handled by WebLicht.
> In the current scenario, both TEI variants will need a TCF
> decoder/encoder anyway before anything meaningful can be done in
> WebLicht, and I don't think it makes sense to attempt a single
> decoder/encoder pair which handles both variants. So I would opt to
> make the distinction between the TEI dialects on the same level where
> other file types are distinguished. [...]
----------------
On Wed, Jun 1, 2016 at 3:40 PM, Bryan Jurish <jurish at bbaw.de> wrote:
> do you have concrete suggestions for @type and/or its possible
> (parameter=value) pairs?
----------------
On Thu, Jun 2, 2016 at 8:23 AM, Thomas Schmidt
<thomas.schmidt at ids-mannheim.de> wrote:
> My understanding was that all services in WebLicht have to specify
> their input and output formats via an appropriate mime type.
> Concerning the TEI data, this would mean that two (possibly more,
> eventually) types of TEI would have to be distinguished (instead of
> just one, as is the case now). For example:
>
> text/x-tei-isospoken+xml
> text/x-tei-dta+xml
>
> Since spoken language data will rarely come directly as TEI, converter
> services for the most common tool formats would have to be prepended
> (one from the HZSK is already available as a prototype). There already
> seem to be two different flavours of EXMARaLDA, I couldn't find any
> documentation on the difference between the two. Ultimately, it would
> be good to be able to distinguish something like
>
> text/x-exmaralda-exb+xml (for EXMARaLDA basic transcriptions)
> text/x-transcriber-trs+xml (for Transcriber files)
> text/x-folker-flk+xml (for FOLKER transcriptions)
> ... (and possibly more for ELAN and others)
----------------
On Thu, Jun 2, 2016 at 9:45 AM, Hanna Hedeland <hanna.hedeland at gmail.com> wrote:
> [...] mimetypes are one option, [...], and I think what we need is
> 1) a decision whether to specify TEI dialects via different mimetypes or
> rather to use one TEI mimetype and an additional dialect parameter for the
> dialects - the WebLicht team would know about the implications of these
> options for the system, I can only imagine that the orchestration might
> become more complicated if some mimetypes can only be understood with an
> additional parameter, others on their own -  on the other hand maybe further
> mimetypes will have implications for the world outside WebLicht
>
> 2) some way of managing the inventory of used mimetypes or
> mimetypes+parameters to ensure we all know which file formats are in use and
> how they should be described (especially relevant for converters)
>
> I think the webservice developers will have really valuable input, but in
> the end, maybe the WebLicht developers have to decide on this as they will
> be implementing the chosen solution?

On Thu, Jun 16, 2016 at 5:11 PM, Thorsten Trippel
<thorsten.trippel at uni-tuebingen.de> wrote:
> Yes it was in this context where I heard this discussion. The TEI importer,
> as far as I can tell, does not import generic TEI but only specific flavors.
> If we send a TEI file to weblicht, the TEI tool will assume it is according
> to this specific flavor, I did not test what happens if it is not. I am
> afraid it is getting messy.... WebLicht looks at the type of file to suggest
> matching webservices.  Maybe somebody else can provide more details, for
> example the Berlin team... or Hamburg if they read along...
>
> Cheers
> Thorsten
>
>
> Am 16.06.16 um 17:06 schrieb Dieter Van Uytvanck:
>>
>> On 16/06/16 16:41, Thorsten Trippel wrote:
>>>
>>> Unless of course the tools really interpret all profiles or all TEI
>>> flavors.
>>
>>
>> Hi Thorsten,
>>
>> you are anticipating my next question - what web applictions do we have
>> that can process TEI files in general, independent from the different
>> subvariants?
>>
>> At least WebLicht seems to have a TEI importer
>> (http://wiki.tei-c.org/index.php/WebLicht#cite_note-1 - taken from the
>> list at http://wiki.tei-c.org/index.php/Category:Analysis_tools). Do you
>> know if it is generic, or if it expects a specific sub-variant?
>>
>> And would hope there are more out there...
>>
>> best,
>>
>
>
> --
> ----------------------------------------------------------------------------
> ///////// Dr. Thorsten Trippel   thorsten.trippel at uni-tuebingen.de
>    //     Seminar für Sprachwissenschaft
>   //  //  Eberhard-Karls-Universität Tübingen
>  //  //   Office:  Wilhelmstr. 19 #2.17
>     //    Phone:   +49 (0)7071-29-77352
> ///////// Federal Republic of Germany
> -----------------------------------------------------------------------------
> _______________________________________________
> Dev mailing list
> Dev at lists.clarin.eu
> https://lists.clarin.eu/cgi-bin/mailman/listinfo/dev



-- 
Thomas Schmidt
IDS Mannheim
R5, 6-13
D-68161 Mannheim
Tel.: +49 (621) 1581-313
http://agd.ids-mannheim.de/index.shtml
http://www.exmaralda.org


More information about the Standards mailing list