LTER Home | Intranet | LNO
Search:

The Data Access Server Project

Introduction

The LTER Network has invested considerable time, effort, and funding into the collection of scientific data. Access and use of this data is formalized through the end user's acceptance of the LTER Network Data Access Policy, Data Access Requirements, and the General Data Use Agreement (http://www.lternet.edu/data/ netpolicy.html), which were approved by the LTER Network Coordinating Committee on 6 April 2005. Motivation behind these policies and agreements is driven by the need to document the flow of data from the LTER Network out to the community to validate broader impacts of the LTER program. As such, the LTER Network has adopted a "standard" for data access and use that needs to be implemented into both local and network-wide computing infrastructure. This standard requires that the end user provide basic identifying information, including name, affiliation, email address, and contact information, in an electronic format that can be provided to the data owner. Further, acknowledgment and acceptance of either the General Public Use Agreement and or the Restricted Data Use Agreement applied to a data set, and a statement of the intended use of the LTER data, will be recorded prior to the release of any LTER data.

Approach

To support sites in their compliance with LTER data policies, the DAS project will provide sites with the ability to replace “direct link” data URLs with proxy URLs, which will route all data requests through an authentication, auditing, and notification service called the Data Access Server (DAS). The DAS will support five primary objectives that have been defined through analysis of use case scenarios:

  1. end user registration and acceptance of the LTER data agreements, including authentication;
  2. URL management;
  3. notification of data access to the Data Set Contact;
  4. auditing of all data access; and
  5. reporting.

The basic notion of the DAS is that sites may register a data URL expression into the DAS URL management interface and replace the data URL with a similar representation using a proxy URL. The goal of the DAS is to route end users (including non-interactive applications) through the DAS so that access to LTER data can be logged and the appropriate contact be notified about the access event. The DAS strives to use as much existing LTER cyberinfrastructure as reasonable, including:

  1. the LTER Data Catalog servlet infrastructure to support the integration of DAS components;
  2. Ecological Metadata Language documents in the LTER Data Catalog to identify appropriate contact information for notification and to ascertain data table entity names; and
  3. the LTER LDAP for end user contact information and authentication (authentication will also use LDAP databases of our affiliate networks, such as NCEAS and PISCO).

A graphical representation of the DAS network topology is shown in Figure 1.


Figure 1: Generalized network-level architecture of the Data Access Server model.

The LTER NIS development team has identified a general model for a Network-wide LTER Data Access Policy implementation strategy called the Data Access Server (DAS). The DAS model proposes a centralized NIS service that would be integrated into the LTER Data Catalog infrastructure (see Figure 2). This approach takes advantage of common tasks already implemented within the Data Catalog, such as authentication and audit logging. The DAS service components would add five areas of functionality to the LTER Data Catalog as noted above.


Figure 2: Software stack of the Data Access Server model.

End user registration and authentication will take place through the standard LTER Data Catalog user interface. Registration will include forms to accept the LTER Data Access Policy and an area to enter the intended use of the LTER data. Once registered and authenticated, users may accept a web-browser cookie that will allow seamless access to data for future requests.

The pass through process will rely on the replacement of the URL that references site data with a "proxy" URL that points instead to the DAS. This approach requires the site to register their data URL with the DAS so that a one-to-one correspondence between the data URL and the proxy URL can be determined by the DAS service. The proxy URL is used in lieu of the actual data URL within any LTER metadata document (including EML) that is published for public viewing. When an end user wishes to download data by selecting the online distribution URL in the metadata document, they would be directed to the DAS first and have their credentials validated, before a data stream is returned on the site's behalf. If the end user has not registered at this point, they would be directed to the appropriate registration interface. If they have already registered and there exists a token (e.g., cookie) on their workstation, they would be provided the data without restriction. Otherwise, the end user would be directed to a log-in interface prior to receiving any data.

Any download event from through the DAS invoked by the end user will send an email notification to the data owner/provider of the data download event, along with the end user's contact information and the name of the downloaded data set. In addition, the DAS will also send a notification to the end user with the data owner/provider's contact information, the General Use Agreement, and any special Restricted Data Use Agreement for the specific data set that was downloaded.

The DAS will also log all event information into its audit log table; this process will be an addendum to the the logging that already is part of the routine of the Data Catalog.

The DAS will provide periodic reports of all data access events to LTER sites, in addition to supporting an interface that will allow site information managers to perform interactive queries for specific events.

Advantages

  1. The DAS model does not require sites to participate or change their current practice of providing direct access to their data. It is a model that may be utilized at the site's convenience, perhaps addressing sensitive or high-profile data first.
  2. Since the DAS would run as a centralized service (potentially distributed) at the LTER Network Office, tools and enhancements based on the DAS model would be available to all participating sites, including data access reports that can be perused directly by NSF officials. This can be an effective method for standing groups like the Information Manager Executive committee or the LTER Executive Board to analyze LTER data access through a single interface.
  3. The DAS model fits nicely within the current LTER LDAP user registry used by the LTER Metacat for user identification. Other Metacat sites (and their users) would not have to conform to the LTER Data Access Policy, but their users would have to register with the DAS before being allowed access to LTER data.

Disadvantages

  1. The current DAS proof-of-concept relies on the use of HTTP cookies for identifying registered users. Cookies, when enabled by the web browser, are sent automatically with each client request to the web server (in this case, the DAS). Any other application (e.g., Kepler or MatLab) that could not send a cookie would automatically fail the user identification process and not be allowed access to LTER data. A more robust method for providing generic identification would have to be identified.
  2. The DAS model requires sites to change their data access URLs within their EML documents and/or any data references that would be bound by the LTER Data Access Policy.
  3. A new registration interface would be required to collect the necessary Data Access Policy information. This would require users to submit new information into the DAS, even those who are already registered in the LTER LDAP.

Conclusion

The DAS model is one method for sites to easily conform to the LTER Data Access Policy. A fully functioning DAS implementation is expected in early 2009. The LTER NIS Development Team welcomes all comments and suggestions for improving this model, and anticipates working closely with beta-sites to evaluate and test the DAS model. For more detailed information, please refer to the DAS Project Plan.

For additional information – Mark Servilla, LTER Network Office (servilla@lternet.edu)