[Dev] retrieving the CQL endpoints from the center registry

Thomas Zastrow thomas.zastrow at uni-tuebingen.de
Thu Aug 16 12:09:57 CEST 2012


Hi Marc,

Here is an example how this is used by the colleagues from Leipzig. The 
scan operation looks like this:

http://clarinws.informatik.uni-leipzig.de:8080/CQL?operation=scan&scanClause=fcs.resource&version=1.2

This gives me the information about which corpora are available and then 
I can query them individually:

http://clarinws.informatik.uni-leipzig.de:8080/CQL?operation=searchRetrieve&version=1.2&query=Boppard&x-context=11858/00-229C-0000-0001-B070-E

where "x-context" can also contain a list of resources.

Best,

Tom




Am 16.08.12 12:03, schrieb Marc KempsSnijders:
> For your information,
> the Mimore service currently supports search across three resources 
> (MAND, SAND and DIDDD), so the x-context parameter is of direct 
> interest to us as an endpoint parameter
>
> Marc
>
> On Aug 16, 2012, at 11:56 AM, Thomas Zastrow wrote:
>
>> Hi Matej,
>>
>> Am 16.08.12 11:41, schrieb Matej Durco:
>>>
>>> haven't we introduced  the x-context parameter exactly for this?
>>> to allow to tell the endpoint, to query only again given corpus?
>>> For this the endpoint has to expose the list of corpora (identified
>>> with (P)IDs)
>>> and subsequently be able to resolve the ID given in the x-context
>>> parameter to corresponding corpus.
>>
>> Yes, you are right, but so far in practice not many endpoints are
>> supporting that functionality. But I have integrated it already in my
>> code now.
>>
>>>
>>>>
>>>>
>>>>> - have a collection record per endpoint (a CMDI giving a language 
>>>>> list,
>>>>> modality, etc. for each corpus) to which we can refer in the center
>>>>> registry or from the scan response
>>>>>
>>>>> I think I would like the last option the most, as it is relatively
>>>>> light-weight, not too hard to make and it would also be in the 
>>>>> hands of
>>>>> the centers providing the end points (instead of being hardcoded). 
>>>>> What
>>>>> is your opinion?
>>>>
>>>> I don't exactly understand what you mean, but I would think that it
>>>> would be good to have all the necessary information at one point. I'm
>>>> parsing the center registry now to find the endpoints, can't we add
>>>> these information there:
>>>>
>>>> <WebReference>
>>>> <Website>http://weblicht.sfs.uni-tuebingen.de/rws/cqp-ws/cqp/sru
>>>> </Website><Description>CQL</Description>
>>>>
>>>> <lang>de</lang>
>>>>
>>>> </WebReference>
>>>>
>>>> When we are not storing information about individual corpora here,
>>>> every endpoint can only serve one language. But I think this
>>>> shouldn't be a problem because every center can define as many
>>>> endpoints as they want.
>>>
>>> Perhaps we can agree, to handle language special, because of importance,
>>> but generally I agree here with Dieter, that the conceptually most
>>> sane because in line with existing infrastructure (CMDI) seems to be
>>> to provide CMDI-records for collections
>>> and establish links (via ResourceProxy ) between the endpoint and its
>>> corpora/collections.
>>> Especially because, later we will want to filter by other information
>>> than language,
>>> and where do we stop duplicating this information from the collection
>>> records to the endpoint record?
>>
>> Yes and no - I'm implementing the aggregator at the moment and here I
>> need the information about the language to offer the user a possibility
>> to select in which language a query should run. The aggregator can be
>> configured in two ways:
>>
>> a) from the "outer" world, for example the VLO can link from a specific
>> resource or bundle of resources directly to the aggregator. These
>> resources are then automatically preconfigured to be used in a query
>> b) second, the aggregator is also a (graphical) user interface to the
>> whole FCS. That means, that it has to offer all in the FCS available
>> resources to the user who then can decide which ones he will query. In
>> this case, I need the language information *directly* at the endpoints
>> because I don't want the aggregator to be another VLO-"engine" which
>> harvests CMDI files from all the centers ;-)
>>
>> Yesterday, Oli proposed to use the "extraTermData" for that which makes
>> sense in my oppinion:
>>
>> <sru:extraTermData>
>>      <fcs-scan:lang xmlns:fcs-scan="http://ww.clarin.eu/fcs/scan">
>>        de
>>      </fcs-scan:lang>
>>
>>      <!-- oder laternativ auch eine Variante mit mehren Sprachen -->
>>      <fcs-scan:langs xmlns:fcs-scan="http://ww.clarin.eu/fcs/scan">
>>        <fcs-scan:lang>de</fcs-scan_lang>
>>        <fcs-scan:lang>nl</fcs-scan_lang>
>>      </fcs-scan:langs>
>>   </sru:extraTermData>
>>
>> Best,
>>
>> tom
>>
>>
>> -- 
>> Dr. Thomas Zastrow
>> Seminar fuer Sprachwissenschaft
>> Universitaet Tuebingen
>>
>> Wilhelmstr. 19
>> D-72074 Tuebingen
>>
>> http://www.thomas-zastrow.de
>>
>> Tel.: 07071/29-73968
>> Fax: 07071/29-5214
>>
>> _______________________________________________
>> Dev mailing list
>> Dev at lists.clarin.eu <mailto:Dev at lists.clarin.eu>
>> https://lists.clarin.eu/cgi-bin/mailman/listinfo/dev
>
> ***************************************************************
> * Marc Kemps-Snijders
> * Meertens Instituut (Afdeling Technische Ontwikkeling)
> * Joan Muyskenweg 25 /
> * Postbus 94264
> * 1090 GG Amsterdam
> * tel. +31-(0)20-4628550
>  * marc.kemps.snijders at meertens.knaw.nl 
> <mailto:marc.kemps.snijders at meertens.knaw.nl>
> ***************************************************************
>
>
>
>
>
>
>
>
>
> _______________________________________________
> Dev mailing list
> Dev at lists.clarin.eu
> https://lists.clarin.eu/cgi-bin/mailman/listinfo/dev


-- 
Dr. Thomas Zastrow
Seminar fuer Sprachwissenschaft
Universitaet Tuebingen

Wilhelmstr. 19
D-72074 Tuebingen

http://www.thomas-zastrow.de

Tel.: 07071/29-73968
Fax: 07071/29-5214

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clarin.eu/pipermail/dev/attachments/20120816/aeaac7fd/attachment-0001.html>


More information about the Dev mailing list