[Dev] CLARIN-FCS: clarification about FCS schema

Matej Durco xnrn at gmx.net
Mon Oct 8 12:45:15 CEST 2012


Hi Oliver,

thank you for these very good questions,
pin-pointing the week points...
(as always ;) )

> I have some more questions to clarify the issue:
> - What is the use-case for referencing the parent-collection?
>   Or why is it useful to reference it?
The idea was to be able to enrich a content-search with metadata-information
basically a metadata record of the resource
resolved wrt to the collection it is contained in
(these can be deeper than just the one parent).
This was thought to be relevant for combined metadata-content search
But I am not aware of any well-defined use-case
to support this.

> - What exactly is the distinction between Resource and ResourceFragment?
Resource would refer (with its @pid and/or @ref attribute) to whatever 
is seen by the provider as a Resource,
i.e. (according to the CMDI-definition) it has a PID and a Metadata-record.
ResourceFragment would be any (referencable)  part of it,
the referencing being relative to the Resource (fragment or part 
identifier).
I am aware, that, in practice this is a moving target
(and in the end, decision of the individual providers)
so we can (have to) be pragmatic about it.

> - Can there be Resources without ResourceFragment?
yes, an image, a whole of a text an audio recording....
But yes, most usually (especially with text resources) only a fragment 
of a resource is returned.

> - If I have a KWIC result, do I need to put the KWIC in a
>   ResourceFragment or would an appropriate DataView sufficient?
According to the original idea yes.
ResourceFragment wouldn't be just an semantic-empty element,
it would carry the fragment-identifier
and possibly multiple DataViews of given Fragment
(e.g. kwic, tcf and image)

> - What kind of DataViews are usually expected on a Resource level? Are
>   there already any ideas or use-cases?
e.g. image
but especially DataView for CMD-metadata would go on the Resource-level.
Either with with @ref-attribute to the CMD-record, or the record already 
resolved inline.

> - Are there more than one ResourceFragment expected within a record
>   (bearing in mind, that each hit shall be wrapped within
>    a <sru:record>)?
According to the "one hit per sru:record" premise there actually shouldn't.
So this might be the strongest argument against the ResourceFragment.
However there may be endpoints, that see one matching Resource as one hit.
(e.g. apache solr acts like this, and it seems to require non-trivial 
post-processing,
to produce the result otherwise.)
Then multiple occurrences within that resource would be individual 
Resource Fragments,
This would lead to a somewhat inconsistent result (some endpoints 
delivering ResourceFragments, some Resources as individual hits),
but exactly thanks to this distinction the client/aggregator could 
distinguish between those results
and try to make it clear to the user.

> If we just need to reference the parent, why not just introduce a 
> single (optional?) <Parent> element per Resource, that has the @pid 
> and/or @ref to the parent resource, e.g.
> <Resource pid="{resource-pid" ref="{resource-ref}">
>   <Parent pid="{parent-pid}" ref="{parent-ref} />
>   <ResourceFragment ..>
>      <DataView mine-type="application/x-clarin-fcs-kwic+xml">
>         ...
>      </DataView>
>   </ResorceFragment>
> </Resource>

As explained earlier, this can be more just the parent,
and it seemed cleaner to continue working with the Resource
(as the collections a resources themselves).


best,
Matej


More information about the Dev mailing list