[Dev] Validation of OAI-PMH output for CLARIN centers

Thomas Zastrow thomas.zastrow at uni-tuebingen.de
Wed Feb 20 10:05:09 CET 2013


Dear CLARIN colleagues,

As CLARIN centers, we are publishing our metadata via the OAI-PMH 
protocol. Therefor, the metadata from a bunch of CMDI files is 
concatenated and offered via an OAI-PMH provider.

To make sure that your CLARIN center is compatible with the OAI-PMH 
standard, please test your software at

http://re.cs.uct.ac.za/

We want to point you to the use of XML IDs in the CMDI files: these XML 
IDs have to be unique in the current XML instance. On the other hand, 
when concatenating CMDI files for OAI-PMH harvesting, it could happen 
that IDs from several CMDI files can have the same value. In that case, 
the OAI-PMH output is no longer valid. Please make sure that this does 
not happen in your repository!

Amongst others, you have the following options to archive uniqueness of 
XML Ids within OAI responses:

* Use unique XML IDs in your CMDI files within your repository. Please 
note that perpending or concatenating Handle PIDs is not a good 
solution, because the syntax will not be compatible with XML ID syntax.

* Limit the response set size of your OAI provider to one record per 
request. In this case, your OAI provider must have complete support for 
resumption tokens. Please note that this approach will increase 
harvesting time and network bandwidth (and, depending on your OAI 
provider, system load on your repository)

* Reassign XML IDs when generating a OAI response. Please note that this 
approach usually requires modifications to your OAI provider 
implementation.

Best regards,

Dieter, Oli and Tom

-- 
Dr. Thomas Zastrow
Seminar fuer Sprachwissenschaft
Universitaet Tuebingen

Wilhelmstr. 19
D-72074 Tuebingen

http://www.thomas-zastrow.de

Tel.: 07071/29-73968
Fax: 07071/29-5214



More information about the Dev mailing list