[Dev] retrieving the CQL endpoints from the center registry

Matej Durco xnrn at gmx.net
Thu Aug 16 11:41:33 CEST 2012


Am 15.08.2012 09:02, schrieb Thomas Zastrow:
> Am 14.08.12 17:47, schrieb Dieter Van Uytvanck:
>> On 14/8/12 17:14 , Thomas Zastrow wrote:
>>> a) At the moment, it seems so that every endpoint represents one
>>> corpus.
>> No, that is not completely correct. See e.g.
>> http://trac.clarin.eu/wiki/RepositoryRegistry#Listofcorporaperendpoint.
>
> Hhm, but as I understand, I can not define which corpus should be 
> queried, so when I'm sending a query to an endpoint, always all the 
> corpora behind it are queried at once..? When we will give the users a 
> possibility to choose between corpora, we need a possibility to 
> specify in the query which corpus should be included.

haven't we introduced  the x-context parameter exactly for this?
to allow to tell the endpoint, to query only again given corpus?
For this the endpoint has to expose the list of corpora (identified with 
(P)IDs)
and subsequently be able to resolve the ID given in the x-context 
parameter to corresponding corpus.

>
>
>> - have a collection record per endpoint (a CMDI giving a language list,
>> modality, etc. for each corpus) to which we can refer in the center
>> registry or from the scan response
>>
>> I think I would like the last option the most, as it is relatively
>> light-weight, not too hard to make and it would also be in the hands of
>> the centers providing the end points (instead of being hardcoded). What
>> is your opinion?
>
> I don't exactly understand what you mean, but I would think that it 
> would be good to have all the necessary information at one point. I'm 
> parsing the center registry now to find the endpoints, can't we add 
> these information there:
>
> <WebReference>
> <Website>http://weblicht.sfs.uni-tuebingen.de/rws/cqp-ws/cqp/sru
> </Website><Description>CQL</Description>
>
> <lang>de</lang>
>
> </WebReference>
>
> When we are not storing information about individual corpora here, 
> every endpoint can only serve one language. But I think this shouldn't 
> be a problem because every center can define as many endpoints as they 
> want.

Perhaps we can agree, to handle language special, because of importance,
but generally I agree here with Dieter, that the conceptually most sane 
because in line with existing infrastructure (CMDI) seems to be to 
provide CMDI-records for collections
and establish links (via ResourceProxy ) between the endpoint and its 
corpora/collections.
Especially because, later we will want to filter by other information 
than language,
and where do we stop duplicating this information from the collection 
records to the endpoint record?

best,
matej


More information about the Dev mailing list