[Standards] [Dev] proposal: using a common mime type for TEI files

Sander Maijers sander at clarin.eu
Fri Jun 17 13:00:59 CEST 2016


*not only the proper
BTW, the same holds for suffixes.


On Fri, Jun 17, 2016 at 12:59 PM, Sander Maijers <sander at clarin.eu> wrote:
> Hi Dieter,
>
> Using custom media types can be done in the a number of ways,
> described in https://en.wikipedia.org/wiki/Media_type#Registration_trees
> .
> You stated the benefits of your solution well. Your solution has the
> following costs:
> - You'll have to either go through IANA registration procedure for new
> media types in the ‘Personal or Vanity’ tree, go through IETF
> Standards Action to get a CLARIN-specific tree, or break the standards
> and use custom media types outside of this process.
> - Whatever you opt in this context, no third-party (i.e., general,
> standards compliant tools) will recognize the media type of centre's
> content retrieved via PID URLs anymore.
>
> I find Menzo's approach not the proper as well as most useful one
> compared to media type based approaches. After all, you would want a
> resource's metadata to be completely descriptive of such elementary
> aspects as internal structure and content of the TEI files, and not
> dependent on system configuration (served as custom media type x or y,
> as long as the server remains so configured).
>
> Best,
> Sander
>
>
> On Fri, Jun 17, 2016 at 11:39 AM, Dieter Van Uytvanck <dieter at clarin.eu> wrote:
>> On 16/06/16 20:35, Thomas Schmidt wrote:
>>> Therefore, we would need to distinguish this at whathever the place is
>>> where WebLicht distinguishes file formats. If it is via the mime type,
>>> we would need a mime type extension like "text/x-tei-isospoken+xml"
>>> vs. "text/x-tei-dta+xml". If it is on some other level, we would have
>>> to know which and agree on a suitable set of TEI variant identifiers.
>>> I'm copying relevant parts of the mailing list exchange below for your
>>> information.
>>
>> Dear Thomas,
>>
>> Thank you for this very insightful summary of the discussions on this
>> topic. Looking at all the suggestions made, I think having detailed
>> mimetype extensions would be the most convenient for most parties involved:
>>
>> - It puts the responsibility of providing an exact data type for a file
>> at the side of the metadata creator/resource provider. This is always
>> better than relying on interpretation by a third-party tool.
>>
>> - It does not require changes to (CMDI) metadata profiles.
>>
>> - It makes it feasible for tool/data matching applications (WebLicht,
>> Switchboard, ...) to provide a meaningful processing application.
>>
>> There are of course approaches on other levels too (like suggested by
>> Bart and Menzo), and these could be used in addition to the extended TEI
>> mimetypes:
>>
>> - Matching applications could still try to parse a TEI file (in absence
>> of a detailed mime type) and make a guess about the sub-type, and using
>> @type where available. This is of course not trivial.
>>
>> - The ParameterGroup in the CMDI description can be added. But in many
>> cases that requires metadata providers to change their profiles, which
>> means quite a bit of additional work.
>>
>> I will join the TEI weblicht list, and try to gather a bit more concrete
>> information in the upcoming time at
>>
>> https://trac.clarin.eu/wiki/TEI%20variants
>>
>> (feel free to edit along)
>>
>> When we have that additional information, we can try to come up with
>> concrete recommendations.
>>
>> best regards,
>> --
>> Dieter Van Uytvanck
>> Technical Director CLARIN ERIC
>> www.clarin.eu | tel. +31-(0)850091363 | skype: dietervu.mpi
>> _______________________________________________
>> Dev mailing list
>> Dev at lists.clarin.eu
>> https://lists.clarin.eu/cgi-bin/mailman/listinfo/dev


More information about the Standards mailing list