[Dev] retrieving the CQL endpoints from the center registry

Thomas Zastrow thomas.zastrow at uni-tuebingen.de
Wed Aug 15 09:02:33 CEST 2012


Am 14.08.12 17:47, schrieb Dieter Van Uytvanck:
> On 14/8/12 17:14 , Thomas Zastrow wrote:
>> a) At the moment, it seems so that every endpoint represents one
>> corpus.
> No, that is not completely correct. See e.g.
> http://trac.clarin.eu/wiki/RepositoryRegistry#Listofcorporaperendpoint.

Hhm, but as I understand, I can not define which corpus should be 
queried, so when I'm sending a query to an endpoint, always all the 
corpora behind it are queried at once..? When we will give the users a 
possibility to choose between corpora, we need a possibility to specify 
in the query which corpus should be included.


>
>> b) How can I get the language of the resource from the center
>> registry?
> That's not possible at this point. But we need this information, so
> there are some approaches possible:
>
> - hard-code it for now (based on the trac page)
>
> - have it in the scan response, see
> http://trac.clarin.eu/wiki/FCS-specification#Scan
>
> - extract it from the VLO (but that means we need to get a close
> connection between the aggregator and the VLO, might be good on the long
> term but probably takes a while before that is done)
>
> - have a collection record per endpoint (a CMDI giving a language list,
> modality, etc. for each corpus) to which we can refer in the center
> registry or from the scan response
>
> I think I would like the last option the most, as it is relatively
> light-weight, not too hard to make and it would also be in the hands of
> the centers providing the end points (instead of being hardcoded). What
> is your opinion?

I don't exactly understand what you mean, but I would think that it 
would be good to have all the necessary information at one point. I'm 
parsing the center registry now to find the endpoints, can't we add 
these information there:

<WebReference>
<Website>http://weblicht.sfs.uni-tuebingen.de/rws/cqp-ws/cqp/sru
</Website><Description>CQL</Description>

<lang>de</lang>

</WebReference>

When we are not storing information about individual corpora here, every 
endpoint can only serve one language. But I think this shouldn't be a 
problem because every center can define as many endpoints as they want.

>> c) Are you sure that MPI is giving back KWIC dataview?
> We should. Herman can state this with a higher degree of certainty.
;-)

Best,

Tom



-- 
Dr. Thomas Zastrow
Seminar fuer Sprachwissenschaft
Universitaet Tuebingen

Wilhelmstr. 19
D-72074 Tuebingen

http://www.thomas-zastrow.de

Tel.: 07071/29-73968
Fax: 07071/29-5214



More information about the Dev mailing list