[Dev] retrieving the CQL endpoints from the center registry

Marc KempsSnijders marc.kemps.snijders at meertens.knaw.nl
Thu Aug 16 15:30:55 CEST 2012


Two (possibly useful) comments:

1. One does not know in advance where the language of a resource is located in a resource description. Following the CMDI philosophy this could be embedded anywhere in the CMDI document. Most resources contain more that language specifications, depending upon the element's context.

2. We are working out the b) route at the moment since the harvesting and indexing step has already been taken care of. So on our side we should simply be able to query for all SRU endpoints on  the available CMDI files and start the content search method from there. Or to insert a content search engine widget at the UI level for each resource that has a content search engine link.

Marc

Marc

On Aug 16, 2012, at 2:57 PM, Thomas Zastrow wrote:

Am 16.08.12 14:44, schrieb Herman Stehouwer:
Shouldn't corpora already have CMDI files?
Shouldn't those CMDI files already contain all the information you are going to get about a specific corpus?

Otherwise we can keep adding stuff ...
Yes - but as I already said:

a)
The center registry is my starting point to get the information, which
endpoints are available.

b)
I need information about the corpora at these endpoints, for example the
language. If I don't find these information directly via the endpoints,
it would mean that

For every corpus ...

1.)
Find the CMDI file: Resolve the PID and parse that document (most, but
not all people are using the handle system which means that at this
point I have to probably mind more than one PID-resolver format ...)

2.)
Harvest the CMDI file

3.)
Parse the CMDI file

So, that would be *much* more effort at the user interface part of the
FCS. At the moment, with 10 corpora or so, I can harvest the necessary
information from the center registry and the endpoints in realtime.
Doing it the way that I have to go the circuit via the CMDI files would
slow down everything a lot.

So, be pragmatic, we have less then 10 months to finish the whole thing
and not many people are *really* writing code at the moment ...

Best,

Tom


--
Dr. Thomas Zastrow
Seminar fuer Sprachwissenschaft
Universitaet Tuebingen

Wilhelmstr. 19
D-72074 Tuebingen

http://www.thomas-zastrow.de

Tel.: 07071/29-73968
Fax: 07071/29-5214

_______________________________________________
Dev mailing list
Dev at lists.clarin.eu<mailto:Dev at lists.clarin.eu>
https://lists.clarin.eu/cgi-bin/mailman/listinfo/dev

***************************************************************
* Marc Kemps-Snijders
* Meertens Instituut (Afdeling Technische Ontwikkeling)
* Joan Muyskenweg 25 /
* Postbus 94264
* 1090 GG Amsterdam
* tel. +31-(0)20-4628550
 * marc.kemps.snijders at meertens.knaw.nl<mailto:marc.kemps.snijders at meertens.knaw.nl>
***************************************************************







-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clarin.eu/pipermail/dev/attachments/20120816/ec65a512/attachment.html>


More information about the Dev mailing list