[Dev] Fwd: Re: [Clarind-devel] CLARIN-FCS: clarification about FCS schema
Oliver Schonefeld
schonefeld at ids-mannheim.de
Mon Oct 8 16:48:49 CEST 2012
Hi,
Tom forget to put the CLARIN-EU Dev list, so I'll just forward his mail.
-------- Original-Nachricht --------
Betreff: Re: [Clarind-devel] [Dev] CLARIN-FCS: clarification about FCS
schema
Datum: Mon, 08 Oct 2012 15:19:47 +0200
Von: Thomas Zastrow <thomas.zastrow at uni-tuebingen.de>
An: clarind-devel at mailman.sfs.uni-tuebingen.de,
"Clarind-devel at mailman.sfs.uni-tuebingen.de"
<Clarind-devel at mailman.sfs.uni-tuebingen.de>
Dear all, answers below:
Am 08.10.12 12:45, schrieb Matej Durco:
> Hi Oliver,
>
> thank you for these very good questions,
> pin-pointing the week points...
> (as always ;) )
>
>> I have some more questions to clarify the issue:
>> - What is the use-case for referencing the parent-collection?
>> Or why is it useful to reference it?
> The idea was to be able to enrich a content-search with
> metadata-information
> basically a metadata record of the resource
> resolved wrt to the collection it is contained in
> (these can be deeper than just the one parent).
> This was thought to be relevant for combined metadata-content search
> But I am not aware of any well-defined use-case
> to support this.
If we don't have a concrete use case or even an idea of one - why should
we implement it ;-) ?
Just my two cents: do we really need recursiveness? As a compromise, we
could implement Olis suggestion of a <Parent> element.
>> - What exactly is the distinction between Resource and ResourceFragment?
> Resource would refer (with its @pid and/or @ref attribute) to whatever
> is seen by the provider as a Resource,
> i.e. (according to the CMDI-definition) it has a PID and a
> Metadata-record.
> ResourceFragment would be any (referencable) part of it,
The "referenceability" of a KWIC result could be problematic.
> the referencing being relative to the Resource (fragment or part
> identifier).
> I am aware, that, in practice this is a moving target
> (and in the end, decision of the individual providers)
> so we can (have to) be pragmatic about it.
>
;-)
>> - Can there be Resources without ResourceFragment?
> yes, an image, a whole of a text an audio recording....
> But yes, most usually (especially with text resources) only a fragment
> of a resource is returned.
>
Brings up the next practical question: should we set an upper limit of
bytes to be send around via the FCS?
>> - If I have a KWIC result, do I need to put the KWIC in a
>> ResourceFragment or would an appropriate DataView sufficient?
> According to the original idea yes.
> ResourceFragment wouldn't be just an semantic-empty element,
> it would carry the fragment-identifier
> and possibly multiple DataViews of given Fragment
> (e.g. kwic, tcf and image)
... if there is something like a "fragment-identifier": corpora, which
are queried for KWIC are often have not the possibility to access that
piece of data (the KWIC) directly.
>
>> - What kind of DataViews are usually expected on a Resource level? Are
>> there already any ideas or use-cases?
> e.g. image
> but especially DataView for CMD-metadata would go on the Resource-level.
> Either with with @ref-attribute to the CMD-record, or the record
> already resolved inline.
>
>> - Are there more than one ResourceFragment expected within a record
>> (bearing in mind, that each hit shall be wrapped within
>> a <sru:record>)?
> According to the "one hit per sru:record" premise there actually
> shouldn't.
> So this might be the strongest argument against the ResourceFragment.
Yes.
> However there may be endpoints, that see one matching Resource as one
> hit.
> (e.g. apache solr acts like this, and it seems to require non-trivial
> post-processing,
> to produce the result otherwise.)
> Then multiple occurrences within that resource would be individual
> Resource Fragments,
> This would lead to a somewhat inconsistent result (some endpoints
> delivering ResourceFragments, some Resources as individual hits),
> but exactly thanks to this distinction the client/aggregator could
> distinguish between those results
> and try to make it clear to the user.
I would suggest, that a resource should always has at least one
ResourceFragment. Then we can handle all endpoints the same.
>
>> If we just need to reference the parent, why not just introduce a
>> single (optional?) <Parent> element per Resource, that has the @pid
>> and/or @ref to the parent resource, e.g.
>> <Resource pid="{resource-pid" ref="{resource-ref}">
>> <Parent pid="{parent-pid}" ref="{parent-ref} />
>> <ResourceFragment ..>
>> <DataView mine-type="application/x-clarin-fcs-kwic+xml">
>> ...
>> </DataView>
>> </ResorceFragment>
>> </Resource>
>
> As explained earlier, this can be more just the parent,
> and it seemed cleaner to continue working with the Resource
> (as the collections a resources themselves).
>
I think parent should enough, we should not put too much metadata here
into the protocl.
Best,
Tom
--
Dr. Thomas Zastrow
Seminar fuer Sprachwissenschaft
Universitaet Tuebingen
Wilhelmstr. 19
D-72074 Tuebingen
http://www.thomas-zastrow.de
Tel.: 07071/29-73968
Fax: 07071/29-5214
_______________________________________________
Clarind-devel mailing list
Clarind-devel at mailman.sfs.uni-tuebingen.de
http://mailman.sfs.uni-tuebingen.de/cgi-bin/mailman/listinfo/clarind-devel
More information about the Dev
mailing list