[Dev] CLARIN-FCS: clarification about FCS schema

Matej Durco xnrn at gmx.net
Mon Oct 1 16:28:48 CEST 2012


Hi,

ad 1)
I too vote for  c)
(the @type attribute was introduced
because the the initial values (kwic, fulltext)
did not seem to fit in the mime-type domain.
But as we can define our own mime-types,
it sounds the most proper way.)

2)
the recursiveness was introduced to allow
referencing parent collections to the (matching) Resource
(including (potentially resolved) references to the CMD-record):

<sru:recordData>
     <fcs:Resource pid="{ancestor-collection}">
        <fcs:Resource pid="{parent-collection}">
             <fcs:DataView mime-type="application/x-clarin-cmd+xml"
                         ref="{cmd-url}" />
           <fcs:Resource pid="{matching-resource-handle}" >
                <fcs:ResourceFragment pid="{fragment-identifier}" >
                     <fcs:DataView 
mime-type="application/x-clarin-fcs-kwic+xml" >... </fcs:DataView>
                 </fcs:ResourceFragment>
            </fcs:Resource>
       </fcs:Resource>
   </fcs:Resource>
</sru:recordData>

it is problematic insofar, as if there are multiple matching Resources 
within one collection
they still should be put in a separate hit (<sru:record>).
So admittedly this is practically not applicable:
<sru:recordData>
  <fcs:Resource pid="{collection-handle}">
       <fcs:Resource pid="{res1-pid}" > ... </fcs:Resource>
       <fcs:Resource pid="{res2-pid}" > ... </fcs:Resource>
  <fcs:Resource>
</sru:recordData>

I would still vote for (corrected) recursiveness, except there is an 
alternative proposal for referencing the parent-collections.


best,
matej

Am 01.10.2012 14:12, schrieb Oliver Schonefeld:
> [X-Posted to CLARIN-D developers]
>
> Hi,
>
> while building a SRU client for FCS, I revisited the current CLARIN-FCS
> record schema [1].
> I've got two issues with the current schema, I'd like to get discuss
> with interested developers:
>
> 1) [minor] The dataview type currently allows only three values
>     ("kwic", "fulltext", "image"). Some endpoints, e.g. Meertens, also
>     have a DataView for KML. However, the "kml" is currently not within
>     the set of allowed values, thus resulting in invalid XML.
>     We have several options to deal with this:
>     a) add "kml" to the list of allows values (and do this, every time a
>        new dataview pops up; including bumping the version number of the
>        schema)
>     b) get rid of the predefined values and define attribute value to be
>        of type xs:NMTOKEN (or something similar)
>     c) drop the @type attribute in favor of a proper @mime-type
>        attribute. For our own types (e.g. kwic) we could define
>        a non-standard mime types (cf. RFC 2045, RFC 4288), e.g. like
>        "application/x-clarin-fcs-kwic+xml"
>
>        (SN: KML has a officially registered mime-type:
>             "application/vnd.google-earth.kml+xml")
>
>     BTW, I'd vote for solution c ...
>
>
> 2) [major] "Resource" is currently defined semi-recursive:
>     <xs:complexType name="ResourceType">
>       <xs:sequence>
>         <xs:element maxOccurs="unbounded" minOccurs="0"
>               name="Resource" type="fcs:ResourceType"/>
>         <xs:element maxOccurs="unbounded" minOccurs="0"
>               name="DataView" type="fcs:DataViewType"/>
>         <xs:element maxOccurs="unbounded" minOccurs="0"
>               name="ResourceFragment" type="fcs:ResourceFragmentType"/>
>       </xs:sequence>
>       <xs:attribute name="pid" type="fcs:pidType" use="optional"/>
>       <xs:attribute name="ref" type="fcs:refType" use="optional"/>
>     </xs:complexType>
>     Since maxOccures defaults to 1 (not "unlimited"), the definition of
>     the type in the XSD allows for structures where a Resource may have
>     zero-or-one Resource as child, thus forming structure like
>     (namespaces and other elements omitted for brevity):
>       <Resource ...>
>         <Resource ...>
>           <Resource ...>
>             <Resource ...>
>               <!-- ad infinitum -->
>             </Resource>
>           </Resource>
>         </Resource>
>       </Resource>
>     However no Resource elements, with more than one Resource elements
>     as child, like:
>       <Resource ...>
>         <Resource ...>
>         </Resource>
>         <Resource ...>
>         </Resource>
>       </Resource>
>
>     The first structure does not really make sense to me, while the one
>     could argue, that the second could be used to produce a structures
>     result in form of a (sub-)corpus.
>     My suggestion is either to drop the recursiveness or define it
>     properly (including some real world use cases, why this is needed).
>
>     BTW, I'd vote for dropping the recursiveness ...
>
> Comments, Ideas, Thoughts?
>
> Best,
>    Oliver
>
> [1] http://trac.clarin.eu/browser/FederatedSearch/Resource.xsd



More information about the Dev mailing list