[Dev] retrieving the CQL endpoints from the center registry

Marc KempsSnijders marc.kemps.snijders at meertens.knaw.nl
Thu Aug 16 12:03:16 CEST 2012


For your information,
the Mimore service currently supports search across three resources (MAND, SAND and DIDDD), so the x-context parameter is of direct interest to us as an endpoint parameter

Marc

On Aug 16, 2012, at 11:56 AM, Thomas Zastrow wrote:

Hi Matej,

Am 16.08.12 11:41, schrieb Matej Durco:

haven't we introduced  the x-context parameter exactly for this?
to allow to tell the endpoint, to query only again given corpus?
For this the endpoint has to expose the list of corpora (identified
with (P)IDs)
and subsequently be able to resolve the ID given in the x-context
parameter to corresponding corpus.

Yes, you are right, but so far in practice not many endpoints are
supporting that functionality. But I have integrated it already in my
code now.




- have a collection record per endpoint (a CMDI giving a language list,
modality, etc. for each corpus) to which we can refer in the center
registry or from the scan response

I think I would like the last option the most, as it is relatively
light-weight, not too hard to make and it would also be in the hands of
the centers providing the end points (instead of being hardcoded). What
is your opinion?

I don't exactly understand what you mean, but I would think that it
would be good to have all the necessary information at one point. I'm
parsing the center registry now to find the endpoints, can't we add
these information there:

<WebReference>
<Website>http://weblicht.sfs.uni-tuebingen.de/rws/cqp-ws/cqp/sru
</Website><Description>CQL</Description>

<lang>de</lang>

</WebReference>

When we are not storing information about individual corpora here,
every endpoint can only serve one language. But I think this
shouldn't be a problem because every center can define as many
endpoints as they want.

Perhaps we can agree, to handle language special, because of importance,
but generally I agree here with Dieter, that the conceptually most
sane because in line with existing infrastructure (CMDI) seems to be
to provide CMDI-records for collections
and establish links (via ResourceProxy ) between the endpoint and its
corpora/collections.
Especially because, later we will want to filter by other information
than language,
and where do we stop duplicating this information from the collection
records to the endpoint record?

Yes and no - I'm implementing the aggregator at the moment and here I
need the information about the language to offer the user a possibility
to select in which language a query should run. The aggregator can be
configured in two ways:

a) from the "outer" world, for example the VLO can link from a specific
resource or bundle of resources directly to the aggregator. These
resources are then automatically preconfigured to be used in a query
b) second, the aggregator is also a (graphical) user interface to the
whole FCS. That means, that it has to offer all in the FCS available
resources to the user who then can decide which ones he will query. In
this case, I need the language information *directly* at the endpoints
because I don't want the aggregator to be another VLO-"engine" which
harvests CMDI files from all the centers ;-)

Yesterday, Oli proposed to use the "extraTermData" for that which makes
sense in my oppinion:

<sru:extraTermData>
     <fcs-scan:lang xmlns:fcs-scan="http://ww.clarin.eu/fcs/scan">
       de
     </fcs-scan:lang>

     <!-- oder laternativ auch eine Variante mit mehren Sprachen -->
     <fcs-scan:langs xmlns:fcs-scan="http://ww.clarin.eu/fcs/scan">
       <fcs-scan:lang>de</fcs-scan_lang>
       <fcs-scan:lang>nl</fcs-scan_lang>
     </fcs-scan:langs>
  </sru:extraTermData>

Best,

tom


--
Dr. Thomas Zastrow
Seminar fuer Sprachwissenschaft
Universitaet Tuebingen

Wilhelmstr. 19
D-72074 Tuebingen

http://www.thomas-zastrow.de

Tel.: 07071/29-73968
Fax: 07071/29-5214

_______________________________________________
Dev mailing list
Dev at lists.clarin.eu<mailto:Dev at lists.clarin.eu>
https://lists.clarin.eu/cgi-bin/mailman/listinfo/dev

***************************************************************
* Marc Kemps-Snijders
* Meertens Instituut (Afdeling Technische Ontwikkeling)
* Joan Muyskenweg 25 /
* Postbus 94264
* 1090 GG Amsterdam
* tel. +31-(0)20-4628550
 * marc.kemps.snijders at meertens.knaw.nl<mailto:marc.kemps.snijders at meertens.knaw.nl>
***************************************************************







-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clarin.eu/pipermail/dev/attachments/20120816/50942557/attachment.html>


More information about the Dev mailing list