[Dev] retrieving the CQL endpoints from the center registry

Thu Aug 16 11:56:54 CEST 2012

Hi Matej,

Am 16.08.12 11:41, schrieb Matej Durco:
>
> haven't we introduced  the x-context parameter exactly for this?
> to allow to tell the endpoint, to query only again given corpus?
> For this the endpoint has to expose the list of corpora (identified 
> with (P)IDs)
> and subsequently be able to resolve the ID given in the x-context 
> parameter to corresponding corpus.

Yes, you are right, but so far in practice not many endpoints are 
supporting that functionality. But I have integrated it already in my 
code now.

>
>>
>>
>>> - have a collection record per endpoint (a CMDI giving a language list,
>>> modality, etc. for each corpus) to which we can refer in the center
>>> registry or from the scan response
>>>
>>> I think I would like the last option the most, as it is relatively
>>> light-weight, not too hard to make and it would also be in the hands of
>>> the centers providing the end points (instead of being hardcoded). What
>>> is your opinion?
>>
>> I don't exactly understand what you mean, but I would think that it 
>> would be good to have all the necessary information at one point. I'm 
>> parsing the center registry now to find the endpoints, can't we add 
>> these information there:
>>
>> <WebReference>
>> <Website>http://weblicht.sfs.uni-tuebingen.de/rws/cqp-ws/cqp/sru
>> </Website><Description>CQL</Description>
>>
>> <lang>de</lang>
>>
>> </WebReference>
>>
>> When we are not storing information about individual corpora here, 
>> every endpoint can only serve one language. But I think this 
>> shouldn't be a problem because every center can define as many 
>> endpoints as they want.
>
> Perhaps we can agree, to handle language special, because of importance,
> but generally I agree here with Dieter, that the conceptually most 
> sane because in line with existing infrastructure (CMDI) seems to be 
> to provide CMDI-records for collections
> and establish links (via ResourceProxy ) between the endpoint and its 
> corpora/collections.
> Especially because, later we will want to filter by other information 
> than language,
> and where do we stop duplicating this information from the collection 
> records to the endpoint record?

Yes and no - I'm implementing the aggregator at the moment and here I 
need the information about the language to offer the user a possibility 
to select in which language a query should run. The aggregator can be 
configured in two ways:

a) from the "outer" world, for example the VLO can link from a specific 
resource or bundle of resources directly to the aggregator. These 
resources are then automatically preconfigured to be used in a query
b) second, the aggregator is also a (graphical) user interface to the 
whole FCS. That means, that it has to offer all in the FCS available 
resources to the user who then can decide which ones he will query. In 
this case, I need the language information *directly* at the endpoints 
because I don't want the aggregator to be another VLO-"engine" which 
harvests CMDI files from all the centers ;-)

Yesterday, Oli proposed to use the "extraTermData" for that which makes 
sense in my oppinion:

<sru:extraTermData>
      <fcs-scan:lang xmlns:fcs-scan="http://ww.clarin.eu/fcs/scan">
        de
      </fcs-scan:lang>

      <!-- oder laternativ auch eine Variante mit mehren Sprachen -->
      <fcs-scan:langs xmlns:fcs-scan="http://ww.clarin.eu/fcs/scan">
        <fcs-scan:lang>de</fcs-scan_lang>
        <fcs-scan:lang>nl</fcs-scan_lang>
      </fcs-scan:langs>
   </sru:extraTermData>

Best,

tom

-- 
Dr. Thomas Zastrow
Seminar fuer Sprachwissenschaft
Universitaet Tuebingen

Wilhelmstr. 19
D-72074 Tuebingen

http://www.thomas-zastrow.de

Tel.: 07071/29-73968
Fax: 07071/29-5214