[Standards] [Dev] proposal: using a common mime type for TEI files

Kai Zimmer zimmer at bbaw.de
Fri Jun 17 13:31:08 CEST 2016


Hi all,

i agree with Sander - creating own Mime types is difficult and time 
consuming. Also, we have already nice metadata inside the xml header 
(schema definitions).

I see two main use cases
a) a web browser accessing xml files via VLO or the repository website 
(where a DH community user hopefully would have an XML editor at hands 
which can use the handle Mime types used by IDS and DTA and also the XML 
header)
b) WebLicht which has access to the XML header and additionally to the 
CMDI description

So, currently i don't see the need for additional Mime types - but maybe 
you have other use cases in mind, Dieter?

imho the bigger problem is that many WebLicht services currently can't 
handle XML files larger than maybe 1 MB (at least 90% of our DTA files), 
especially when additional layers  are added.

Best,
Kai



Am 17.06.2016 um 13:00 schrieb Sander Maijers:
> *not only the proper
> BTW, the same holds for suffixes.
>
>
> On Fri, Jun 17, 2016 at 12:59 PM, Sander Maijers <sander at clarin.eu> wrote:
>> Hi Dieter,
>>
>> Using custom media types can be done in the a number of ways,
>> described in https://en.wikipedia.org/wiki/Media_type#Registration_trees
>> .
>> You stated the benefits of your solution well. Your solution has the
>> following costs:
>> - You'll have to either go through IANA registration procedure for new
>> media types in the ‘Personal or Vanity’ tree, go through IETF
>> Standards Action to get a CLARIN-specific tree, or break the standards
>> and use custom media types outside of this process.
>> - Whatever you opt in this context, no third-party (i.e., general,
>> standards compliant tools) will recognize the media type of centre's
>> content retrieved via PID URLs anymore.
>>
>> I find Menzo's approach not the proper as well as most useful one
>> compared to media type based approaches. After all, you would want a
>> resource's metadata to be completely descriptive of such elementary
>> aspects as internal structure and content of the TEI files, and not
>> dependent on system configuration (served as custom media type x or y,
>> as long as the server remains so configured).
>>
>> Best,
>> Sander
>>
>>
>> On Fri, Jun 17, 2016 at 11:39 AM, Dieter Van Uytvanck <dieter at clarin.eu> wrote:
>>> On 16/06/16 20:35, Thomas Schmidt wrote:
>>>> Therefore, we would need to distinguish this at whathever the place is
>>>> where WebLicht distinguishes file formats. If it is via the mime type,
>>>> we would need a mime type extension like "text/x-tei-isospoken+xml"
>>>> vs. "text/x-tei-dta+xml". If it is on some other level, we would have
>>>> to know which and agree on a suitable set of TEI variant identifiers.
>>>> I'm copying relevant parts of the mailing list exchange below for your
>>>> information.
>>> Dear Thomas,
>>>
>>> Thank you for this very insightful summary of the discussions on this
>>> topic. Looking at all the suggestions made, I think having detailed
>>> mimetype extensions would be the most convenient for most parties involved:
>>>
>>> - It puts the responsibility of providing an exact data type for a file
>>> at the side of the metadata creator/resource provider. This is always
>>> better than relying on interpretation by a third-party tool.
>>>
>>> - It does not require changes to (CMDI) metadata profiles.
>>>
>>> - It makes it feasible for tool/data matching applications (WebLicht,
>>> Switchboard, ...) to provide a meaningful processing application.
>>>
>>> There are of course approaches on other levels too (like suggested by
>>> Bart and Menzo), and these could be used in addition to the extended TEI
>>> mimetypes:
>>>
>>> - Matching applications could still try to parse a TEI file (in absence
>>> of a detailed mime type) and make a guess about the sub-type, and using
>>> @type where available. This is of course not trivial.
>>>
>>> - The ParameterGroup in the CMDI description can be added. But in many
>>> cases that requires metadata providers to change their profiles, which
>>> means quite a bit of additional work.
>>>
>>> I will join the TEI weblicht list, and try to gather a bit more concrete
>>> information in the upcoming time at
>>>
>>> https://trac.clarin.eu/wiki/TEI%20variants
>>>
>>> (feel free to edit along)
>>>
>>> When we have that additional information, we can try to come up with
>>> concrete recommendations.
>>>
>>> best regards,
>>> --
>>> Dieter Van Uytvanck
>>> Technical Director CLARIN ERIC
>>> www.clarin.eu | tel. +31-(0)850091363 | skype: dietervu.mpi
>>> _______________________________________________
>>> Dev mailing list
>>> Dev at lists.clarin.eu
>>> https://lists.clarin.eu/cgi-bin/mailman/listinfo/dev
> _______________________________________________
> Dev mailing list
> Dev at lists.clarin.eu
> https://lists.clarin.eu/cgi-bin/mailman/listinfo/dev



More information about the Standards mailing list