[Dev] Fwd: Re: [Clarind-devel] CLARIN-FCS: clarification about FCS schema

Oliver Schonefeld schonefeld at ids-mannheim.de
Mon Oct 8 16:48:49 CEST 2012


Hi,

Tom forget to put the CLARIN-EU Dev list, so I'll just forward his mail.

-------- Original-Nachricht --------
Betreff: Re: [Clarind-devel] [Dev] CLARIN-FCS: clarification about FCS
schema
Datum: Mon, 08 Oct 2012 15:19:47 +0200
Von: Thomas Zastrow <thomas.zastrow at uni-tuebingen.de>
An: clarind-devel at mailman.sfs.uni-tuebingen.de,
"Clarind-devel at mailman.sfs.uni-tuebingen.de"
<Clarind-devel at mailman.sfs.uni-tuebingen.de>

Dear all, answers below:

Am 08.10.12 12:45, schrieb Matej Durco:
> Hi Oliver,
>
> thank you for these very good questions,
> pin-pointing the week points...
> (as always ;) )
>
>> I have some more questions to clarify the issue:
>> - What is the use-case for referencing the parent-collection?
>>   Or why is it useful to reference it?
> The idea was to be able to enrich a content-search with 
> metadata-information
> basically a metadata record of the resource
> resolved wrt to the collection it is contained in
> (these can be deeper than just the one parent).
> This was thought to be relevant for combined metadata-content search
> But I am not aware of any well-defined use-case
> to support this.
If we don't have a concrete use case or even an idea of one - why should
we implement it ;-) ?

Just my two cents: do we really need recursiveness? As a compromise, we
could implement Olis suggestion of a <Parent> element.


>> - What exactly is the distinction between Resource and ResourceFragment?
> Resource would refer (with its @pid and/or @ref attribute) to whatever 
> is seen by the provider as a Resource,
> i.e. (according to the CMDI-definition) it has a PID and a 
> Metadata-record.
> ResourceFragment would be any (referencable)  part of it,

The "referenceability" of a KWIC result could be problematic.


> the referencing being relative to the Resource (fragment or part 
> identifier).
> I am aware, that, in practice this is a moving target
> (and in the end, decision of the individual providers)
> so we can (have to) be pragmatic about it.
>
;-)

>> - Can there be Resources without ResourceFragment?
> yes, an image, a whole of a text an audio recording....
> But yes, most usually (especially with text resources) only a fragment 
> of a resource is returned.
>
Brings up the next practical question: should we set an upper limit of
bytes to be send around via the FCS?

>> - If I have a KWIC result, do I need to put the KWIC in a
>>   ResourceFragment or would an appropriate DataView sufficient?
> According to the original idea yes.
> ResourceFragment wouldn't be just an semantic-empty element,
> it would carry the fragment-identifier
> and possibly multiple DataViews of given Fragment
> (e.g. kwic, tcf and image)

... if there is something like a "fragment-identifier": corpora, which
are queried for KWIC are often have not the possibility to access that
piece of data (the KWIC) directly.

>
>> - What kind of DataViews are usually expected on a Resource level? Are
>>   there already any ideas or use-cases?
> e.g. image
> but especially DataView for CMD-metadata would go on the Resource-level.
> Either with with @ref-attribute to the CMD-record, or the record 
> already resolved inline.
>
>> - Are there more than one ResourceFragment expected within a record
>>   (bearing in mind, that each hit shall be wrapped within
>>    a <sru:record>)?
> According to the "one hit per sru:record" premise there actually 
> shouldn't.
> So this might be the strongest argument against the ResourceFragment.
Yes.

> However there may be endpoints, that see one matching Resource as one 
> hit.
> (e.g. apache solr acts like this, and it seems to require non-trivial 
> post-processing,
> to produce the result otherwise.)
> Then multiple occurrences within that resource would be individual 
> Resource Fragments,
> This would lead to a somewhat inconsistent result (some endpoints 
> delivering ResourceFragments, some Resources as individual hits),
> but exactly thanks to this distinction the client/aggregator could 
> distinguish between those results
> and try to make it clear to the user.
I would suggest, that a resource should always has at least one
ResourceFragment. Then we can handle all endpoints the same.

>
>> If we just need to reference the parent, why not just introduce a 
>> single (optional?) <Parent> element per Resource, that has the @pid 
>> and/or @ref to the parent resource, e.g.
>> <Resource pid="{resource-pid" ref="{resource-ref}">
>>   <Parent pid="{parent-pid}" ref="{parent-ref} />
>>   <ResourceFragment ..>
>>      <DataView mine-type="application/x-clarin-fcs-kwic+xml">
>>         ...
>>      </DataView>
>>   </ResorceFragment>
>> </Resource>
>
> As explained earlier, this can be more just the parent,
> and it seemed cleaner to continue working with the Resource
> (as the collections a resources themselves).
>
I think parent should enough, we should not put too much metadata here
into the protocl.

Best,

Tom



-- 
Dr. Thomas Zastrow
Seminar fuer Sprachwissenschaft
Universitaet Tuebingen

Wilhelmstr. 19
D-72074 Tuebingen

http://www.thomas-zastrow.de

Tel.: 07071/29-73968
Fax: 07071/29-5214

_______________________________________________
Clarind-devel mailing list
Clarind-devel at mailman.sfs.uni-tuebingen.de
http://mailman.sfs.uni-tuebingen.de/cgi-bin/mailman/listinfo/clarind-devel





More information about the Dev mailing list