CROSSREF by ATYPON

Author: Hisham Shahtout - atypon systems

Real-time queries in CrossRef


Introduction

Real time queries provide CrossRef users with an interface to resolve their queries in the fastest possible way. The interface is stream based and concurrent on the CrossRef side. Below we explain the limitations of HTTP based queries and explain how the real-time query interface overcomes those limitations.

HTTP based queries

HTTP based queries are the most common way to query the CrossRef system. It is fairly simple for new publishers to come up-to-speed on CrossRef because of the simplicity and ubiquity of the HTTP protocol. CrossRef provides sample code that makes the task even simpler. However, HTTP based queries have certain drawbacks that make them unsuitable for linking to CrossRef in real-time. These drawbacks can be summarized as follows:

Architecture of real-time queries

The best way to think of the real-time query interface is to think of it as a telnet session into CrossRef. As a matter of fact you can test the real-time query interface by establishing a telnet connection with CrossRef. From a shell prompt type the following:
~>telnet doi.crossref.org 8081
This will establish connection with CrossRef on the port where the "real time server" is running at the CrossRef side, port 8081. The first thing to do is to authorize queries that will be coming in on this connection. This is done through a header line similar to that supplied in asynch queries. Type the following into the telnet session: (Don't forget to change <usr> and <pwd> to your username/password as suppplied to you by CrossRef:
H: usr=<usr>;pwd=<pwd>
CrossRef should reply with the following String
AUTHORIZED
Now you have a query stream into CrossRef. If you copy and paste the following queries into your telnet session with CrossRef:
|Clim Dyn|Noguer|14||691|1998||K1|
|J Clim|Leung|12||2010|1999||K2|
|Mon Weather Rev|Laprise|128||4149|2000||K3|
|J Atmos Sci|Kain|47||2784|1990||K4|
|Enzyme Microb Technol|Wang|25||177|1999||K5|
|Q J R Meteorol Soc|Jones|123||265|1997||K6|
|Q J R Meteorol Soc|Jones|121||1413|1995||K7|
|J Clim|Ji|10||1965|1997||K8|
|Bull Environ Contam Toxicol|Saha|63||195|1999||K9|
|Appl Microbiol Biotechnol|Hidalgo|58||260|2002||K10|
You should receive the result of the queries on the output stream from CrossRef. So far so good. So you might be asking yourself: so whats new here!? well three things:
  1. Authorization was done once. As long as the connection with CrossRef is open you can write the the stream as many queries without the need to reauthenticate. You can provide a different user name and password with a new "H:" prefixed header at any time if you wish to change the "identity" of the connection.
  2. The second important distinction here is the way CrossRef handles the queries. Every line is translated into a thread of execution on CrossRef so you gain true concurrency without the need to establish a different (HTTP) CrossRef connection and authenticate yourself on that connection in order to achieve concurrency.
  3. The third distinction here is that your connection to CrossRef is stream based. CrossRef will start executing queries as it is reading lines from the stream. For example: When I ran the queries above I got the following output in my connection to CrossRef
    |Clim Dyn|Noguer|14||691|1998||K1|
    |J Clim|Leung|12||2010|1999||K2|
    09307575|Climate Dynamics|Noguer|14|10|691|1998|full_text|K1|10.1007/s003820050249
    |Mon Weather Rev|Laprise|128||4149|2000||K3|
    |J Atmos Sci|Kain|47||2784|1990||K4|
    15200442,08948755|Journal of Climate|Leung|12|7|2010|1999|abstract_only|K2|10.1175/1520-0442(1999)012<2010:PNCSSB>2.0.CO;2
    |Enzyme Microb Technol|Wang|25||177|1999||K5|
    15200493,00270644|Monthly Weather Review|Laprise|128|12|4149|2000|abstract_only|K3|10.1175/1520-0493(2000)129<4149:POANLA>2.0.CO;2
    15200469,00224928|Journal of the Atmospheric Sciences|Kain|47|23|2784|1990|abstract_only|K4|10.1175/1520-0469(1990)047<2784:AODEPM>2.0.CO;2
    |Q J R Meteorol Soc|Jones|123||265|1997||K6|
    |Q J R Meteorol Soc|Jones|121||1413|1995||K7|
    01410229|Enzyme and Microbial Technology|Wang|25|3-5|177|1999|full_text|K5|10.1016/S0141-0229(99)00060-5
    00000000,00359009|Quarterly Journal of the Royal Meteorological Society|JONES|123|538|265|1997|full_text|K6|10.1256/smsqj.53801
    |J Clim|Ji|10||1965|1997||K8|
    00000000,00359009|Quarterly Journal of the Royal Meteorological Society|JONES|121|526|1413|1995|full_text|K7|10.1256/smsqj.52609
    |Bull Environ Contam Toxicol|Saha|63||195|1999||K9|
    15200442,08948755|Journal of Climate|Ji|10|8|1965|1997|abstract_only|K8|10.1175/1520-0442(1997)010<1965:SOTASM>2.0.CO;2
    |Appl Microbiol Biotechnol|Hidalgo|58||260|2002||K10|
    00074861|Bulletin of Environmental Contamination and Toxicology|Saha|63|2|195|1999|full_text|K9|10.1007/s001289900966
    01757598|Applied Microbiology and Biotechnology|Hidalgo|58|2|260|2002|full_text|K10|10.1007/s00253-001-0876-5
    
    Note that the query with key=K1 was resolved and written to the output stream even before K3 was read from the input stream. This brings up an important point: In order to use real time queries effeciently you have to supply your unique keys into the piped query in order to map them to results. Because of the requirement to resolve queries in real time and the high level of concurrency CrossRef operates in to achieve this it cant guarantee the timing or order in which results will be returned. You have to read CrossRef results from its output stream and map them back to your original query.


© 2000-2003 PILA, Inc.
Software based on the LiteratumTM platform from Atypon systems