libdap  Updated for version 3.20.6
libdap4 is an implementation of OPeNDAP's DAP protocol.
libdap::HTTPCache Class Reference

#include <HTTPCache.h>

Public Member Functions

bool cache_response (const string &url, time_t request_time, const vector< string > &headers, const FILE *body)
 
FILE * get_cached_response (const string &url, vector< string > &headers, string &cacheName)
 
FILE * get_cached_response (const string &url, vector< string > &headers)
 
FILE * get_cached_response (const string &url)
 
vector< string > get_conditional_request_headers (const string &url)
 
bool is_url_valid (const string &url)
 
void lock_cache_interface ()
 
void purge_cache ()
 
void release_cached_response (FILE *response)
 
void unlock_cache_interface ()
 
void update_response (const string &url, time_t request_time, const vector< string > &headers)
 
virtual ~HTTPCache ()
 

Static Public Member Functions

static HTTPCacheinstance (const string &cache_root, bool force=false)
 

Friends

class HTTPCacheInterruptHandler
 
class HTTPCacheTest
 
class HTTPConnectTest
 

Accessors and Mutators for various properties.

string get_cache_root () const
 
void set_cache_enabled (bool mode)
 
bool is_cache_enabled () const
 
void set_cache_disconnected (CacheDisconnectedMode mode)
 
CacheDisconnectedMode get_cache_disconnected () const
 
void set_expire_ignored (bool mode)
 
bool is_expire_ignored () const
 
void set_max_size (unsigned long size)
 
unsigned long get_max_size () const
 
void set_max_entry_size (unsigned long size)
 
unsigned long get_max_entry_size () const
 
void set_default_expiration (int exp_time)
 
int get_default_expiration () const
 
void set_always_validate (bool validate)
 
bool get_always_validate () const
 
void set_cache_control (const vector< string > &cc)
 
vector< string > get_cache_control ()
 

Detailed Description

Implements a multi-process MT-safe HTTP 1.1 compliant (mostly) cache.

Clients that run as users lacking a writable HOME directory MUST disable this cache. Use Connect::set_cache_enable(false).

The original design of this class was taken from the W3C libwww software, written by Henrik Frystyk Nielsen, Copyright MIT

  1. See the file MIT_COPYRIGHT. This software is a complete rewrite in C++ with additional features useful to the DODS and OPeNDAP projects.

This cache does not implement range checking. Partial responses should not be cached (HFN's version did, but it doesn't mesh well with the DAP for which this is being written).

The cache uses the local file system to store responses. If it is being used in a MT application, care should be taken to ensure that the number of available file descriptors is not exceeded.

In addition, when used in a MT program only one thread should use the mutators to set property values. Even though the methods are robust WRT MT software, having several threads change values of cache's properties will lead to odd behavior on the part of the cache. Many of the public methods lock access to the class' interface. This is noted in the documentation for those methods.

Even though the public interface to the cache is typically locked when accessed, an extra locking mechanism is in place for `entries' which are accessed. If a thread accesses a entry, that response must be locked to prevent it from being updated until the thread tells the cache that it's no longer using it. The method get_cache_response() and get_cache_response_body() both lock an entry; use release_cache_response() to release the lock. Entries are locked using a combination of a counter and a mutex. The following methods block when called on a locked entry: is_url_valid(), get_conditional_request_headers(), update_response(). (The locking scheme could be modified so that a distinction is made between reading from and writing to an entry. In this case is_url_valid() and get_conditional_request_headers() would only lock when an entry is in use for writing. But I haven't done that.)

Todo:
Update documentation: get_cache_response() now also serves as is_url_in_cache() and is_url_valid() should only be called after a locked cached response is accessed using get_cahced_response(). These lock the cache for reading. The methods cache_response() and update_response() lock an entry for writing.
Todo:
Check that the lock-for-write and lock-for-read work together since it's possible that an entry in use might have a stream of readers and never free the 'read-lock' thus blocking a writer.
Author
James Gallagher jgall.nosp@m.aghe.nosp@m.r@ope.nosp@m.ndap.nosp@m..org

Definition at line 103 of file HTTPCache.h.

Constructor & Destructor Documentation

◆ ~HTTPCache()

libdap::HTTPCache::~HTTPCache ( )
virtual

Destroy an instance of HTTPCache. This writes the cache index and frees the in-memory cache table structure. The persistent cache (the response headers and bodies and the index file) are not removed. To remove those, either erase the directory that contains the cache using a file system command or use the purge_cache() method (which leaves the cache directory structure in place but removes all the cached information).

This class uses the singleton pattern. Clients should never call this method. The HTTPCache::instance() method arranges to call the HTTPCache::delete_instance() using atexit(). If delete is called more than once, the result will likely be an index file that is corrupt.

Definition at line 302 of file HTTPCache.cc.

Member Function Documentation

◆ cache_response()

bool libdap::HTTPCache::cache_response ( const string &  url,
time_t  request_time,
const vector< string > &  headers,
const FILE *  body 
)

Add a new response to the cache, or replace an existing cached response with new data. This method returns True if the information for url was added to the cache. A response might not be cache-able; in that case this method returns false. (For example, the response might contain the 'Cache-Control: no-cache' header.)

Note that the FILE *body is rewound so that the caller can re-read it without using fseek or rewind.

If a response for url is already present in the cache, it will be replaced by the new headers and body. To update a response in the cache with new meta data, use update_response().

This method locks the class' interface.

Parameters
urlA string which holds the request URL.
request_timeThe time when the request was made, in seconds since 1 Jan 1970.
headersA vector of strings which hold the response headers.
bodyA FILE * to a file which holds the response body.
Returns
True if the response was cached, False if the response could not be cached.
Exceptions
InternalErrThrown if there was a I/O error while writing to the persistent store.

Definition at line 1156 of file HTTPCache.cc.

◆ get_always_validate()

bool libdap::HTTPCache::get_always_validate ( ) const

Should every cache entry be validated before each use?

Returns
True if all cache entries require validation.

Definition at line 850 of file HTTPCache.cc.

◆ get_cache_control()

vector< string > libdap::HTTPCache::get_cache_control ( )

Get the Cache-Control headers.

Returns
A vector of strings, one string for each header.

Definition at line 920 of file HTTPCache.cc.

◆ get_cache_disconnected()

CacheDisconnectedMode libdap::HTTPCache::get_cache_disconnected ( ) const

Get the cache's disconnected mode property.

Definition at line 676 of file HTTPCache.cc.

◆ get_cache_root()

string libdap::HTTPCache::get_cache_root ( ) const

Get the current cache root directory.

Returns
A string that contains the cache root directory.

Definition at line 516 of file HTTPCache.cc.

◆ get_cached_response() [1/3]

FILE * libdap::HTTPCache::get_cached_response ( const string &  url,
vector< string > &  headers,
string &  cacheName 
)

Get information from the cache. For a given URL, get the headers, cache object name and body stored in the cache. Note that this method increments the hit counter for url's entry and locks that entry. To release the lock, the method release_cached_response() must be called. Methods that block on a locked entry are: get_conditional_request_headers(), update_response() and is_url_valid(). In addition, purge_cache() throws Error if it's called and any entries are locked. The garbage collection system will not reclaim locked entries (but works fine when some entries are locked).

This method locks the class' interface.

This method does not check to see that the response is valid, just that it is in the cache. To see if a cached response is valid, use is_url_valid(). The FILE* returned can be used for both reading and writing. The latter allows a client to update the body of a cached response without having to first dump it all to a separate file and then copy it into the cache (using cache_response()).

Parameters
urlGet response information for this URL.
headersReturn the response headers in this parameter
cacheNameA value-result parameter; the name of the cache file
Returns
A FILE * to the response body.
Exceptions
ErrorThrown if the URL's response is not in the cache.
InternalErrThrown if the persistent store cannot be opened.

Definition at line 1480 of file HTTPCache.cc.

◆ get_cached_response() [2/3]

FILE * libdap::HTTPCache::get_cached_response ( const string &  url,
vector< string > &  headers 
)

Get information from the cache. This is a convenience method that calls the three parameter version of get_cache_response().

This method locks the class' interface.

Parameters
urlGet response information for this URL.
headersReturn the response headers in this parameter
Returns
A FILE * to the response body.
Exceptions
ErrorThrown if the URL's response is not in the cache.
InternalErrThrown if the persistent store cannot be opened.

Definition at line 1534 of file HTTPCache.cc.

◆ get_cached_response() [3/3]

FILE * libdap::HTTPCache::get_cached_response ( const string &  url)

Get a pointer to a cached response body. This is a convenience method that calls the three parameter version of get_cache_response().

This method locks the class' interface.

Parameters
urlFind the body associated with this URL.
Returns
A FILE* that points to the response body.
Exceptions
ErrorThrown if the URL is not in the cache.
InternalErrThrown if an I/O error is detected.

Definition at line 1551 of file HTTPCache.cc.

◆ get_conditional_request_headers()

vector< string > libdap::HTTPCache::get_conditional_request_headers ( const string &  url)

Build the headers to send along with a GET request to make that request conditional. This method examines the headers for a given response in the cache and formulates the correct headers for a valid HTTP 1.1 conditional GET request. See RFC 2616, Section 13.3.4.

Rules: If an ETag is present, it must be used. Use If-None-Match. If a Last-Modified header is present, use it. Use If-Modified-Since. If both are present, use both (this means that HTTP 1.0 daemons are more likely to work). If a Last-Modified header is not present, use the value of the Cache-Control max-age or Expires header(s). Note that a 'Cache-Control: max-age' header overrides an Expires header (Sec 14.9.3).

This method locks the cache interface and the cache entry.

Parameters
urlGet the HTTPCacheTable::CacheEntry for this URL.
Returns
A vector of strings, one request header per string.
Exceptions
ErrorThrown if the url is not in the cache.

Definition at line 1249 of file HTTPCache.cc.

◆ get_default_expiration()

int libdap::HTTPCache::get_default_expiration ( ) const

Get the default expiration time used by the cache.

Definition at line 831 of file HTTPCache.cc.

◆ get_max_entry_size()

unsigned long libdap::HTTPCache::get_max_entry_size ( ) const

Get the maximum size of an individual entry in the cache.

Returns
The maximum size in megabytes.

Definition at line 803 of file HTTPCache.cc.

◆ get_max_size()

unsigned long libdap::HTTPCache::get_max_size ( ) const

How big is the cache? The value returned is the size in megabytes.

Definition at line 758 of file HTTPCache.cc.

◆ instance()

HTTPCache * libdap::HTTPCache::instance ( const string &  cache_root,
bool  force = false 
)
static

Get a pointer to the HTTP 1.1 compliant cache. If not already instantiated, this creates an instance of the HTTP cache object and initializes it to use cache_root as the location of the persistent store. If there's an index (.index) file in that directory, it is read as part of the initialization. If the cache has already been initialized, this method returns a pointer to that instance. Note HTTPCache uses the singleton pattern; A process may have only one instance of this object. Also note that HTTPCache is MT-safe. However, if the force parameter is set to true, it may be possible for two or more processes to access the persistent store at the same time resulting in undefined behavior.

Default values: is_cache_enabled(): true, is_cache_protected(): false, is_expire_ignored(): false, the total size of the cache is 20M, 2M of that is reserved for response headers, during GC the cache is reduced to at least 18M (total size - 10% of the total size), and the max size for an individual entry is 3M. It is possible to change the size of the cache, but not to make it smaller than 5M. If expiration information is not sent with a response, it is assumed to expire in 24 hours.

Parameters
cache_rootThe fully qualified pathname of the directory which will hold the cache data (i.e., the persistent store).
forceForce access to the persistent store if true. By default false. Use this only if you're sure no one else is using the same cache root! This is included so that programs may use a cache that was left in an inconsistent state.
Returns
A pointer to the HTTPCache object.
Exceptions
Errorthrown if the cache root cannot set.

Definition at line 129 of file HTTPCache.cc.

◆ is_cache_enabled()

bool libdap::HTTPCache::is_cache_enabled ( ) const

Is the cache currently enabled?

Definition at line 647 of file HTTPCache.cc.

◆ is_url_valid()

bool libdap::HTTPCache::is_url_valid ( const string &  url)

Look in the cache and return the status (validity) of the cached response. This method should be used to determine if a cached response requires validation.

This method locks the class' interface and the cache entry.

Parameters
urlFind the cached response associated with this URL.
Returns
True indicates that the response can be used, False indicates that it must first be validated.
Exceptions
ErrorThrown if the URL's response is not in the cache.

Definition at line 1388 of file HTTPCache.cc.

◆ purge_cache()

void libdap::HTTPCache::purge_cache ( )

Purge both the in-memory cache table and the contents of the cache on disk. This method deletes every entry in the persistent store but leaves the structure intact. The client of HTTPCache is responsible for making sure that all threads have released any responses they pulled from the cache. If this method is called when a response is still in use, it will throw an Error object and not purge the cache.

This method locks the class' interface.

Exceptions
ErrorThrown if an attempt is made to purge the cache when an entry is still in use.

Definition at line 1600 of file HTTPCache.cc.

◆ release_cached_response()

void libdap::HTTPCache::release_cached_response ( FILE *  body)

Call this method to inform the cache that a particular response is no longer in use. When a response is accessed using get_cached_response(), it is locked so that updates and removal (e.g., by the garbage collector) are not possible. Calling this method frees that lock.

This method locks the class' interface.

Parameters
bodyRelease the lock on the response information associated with this FILE *.
Exceptions
ErrorThrown if body does not belong to an entry in the cache or if the entry was already released.

Definition at line 1571 of file HTTPCache.cc.

◆ set_always_validate()

void libdap::HTTPCache::set_always_validate ( bool  validate)

Should every cache entry be validated?

Parameters
validateTrue if every cache entry should be validated before being used.

Definition at line 841 of file HTTPCache.cc.

◆ set_cache_control()

void libdap::HTTPCache::set_cache_control ( const vector< string > &  cc)

Set the request Cache-Control headers. If a request must be satisfied using HTTP, these headers should be included in request since they might be pertinent to a proxy cache.

Ignored headers: no-transform, only-if-cached. These headers are not used by HTTPCache and are not recorded. However, if present in the vector passed to this method, they will be present in the vector returned by get_cache_control.

This method locks the class' interface.

Parameters
ccA vector of strings, each string holds one Cache-Control header.
Exceptions
InternalErrThrown if one of the strings in cc does not start with 'Cache-Control: '.

Definition at line 872 of file HTTPCache.cc.

◆ set_cache_disconnected()

void libdap::HTTPCache::set_cache_disconnected ( CacheDisconnectedMode  mode)

Set the cache's disconnected property. The cache can operate either disconnected from the network or using a proxy cache (but tell that proxy not to use the network).

This method locks the class' interface.

Parameters
modeOne of DISCONNECT_NONE, DISCONNECT_NORMAL or DISCONNECT_EXTERNAL.
See also
CacheDIsconnectedMode

Definition at line 664 of file HTTPCache.cc.

◆ set_cache_enabled()

void libdap::HTTPCache::set_cache_enabled ( bool  mode)

Enable or disable the cache. The cache can be temporarily suspended using the enable/disable property. This does not prevent the cache from being enabled/disable at a later point in time.

Default: yes

This method locks the class' interface.

Parameters
modeTrue if the cache should be enabled, False if it should be disabled.

Definition at line 635 of file HTTPCache.cc.

◆ set_default_expiration()

void libdap::HTTPCache::set_default_expiration ( int  exp_time)

Set the default expiration time. Use the default expiration property to determine when a cached response becomes stale if the response lacks the information necessary to compute a specific value.

Default: 24 hours (86,400 seconds)

This method locks the class' interface.

Parameters
exp_timeThe time in seconds.

Definition at line 819 of file HTTPCache.cc.

◆ set_expire_ignored()

void libdap::HTTPCache::set_expire_ignored ( bool  mode)

How should the cache handle the Expires header? Default: no

This method locks the class' interface.

Parameters
modeTrue if a responses Expires header should be ignored, False otherwise.

Definition at line 690 of file HTTPCache.cc.

◆ set_max_entry_size()

void libdap::HTTPCache::set_max_entry_size ( unsigned long  size)

Set the maximum size for a single entry in the cache.

Default: 3M

This method locks the class' interface.

Parameters
sizeThe size in megabytes.

Definition at line 772 of file HTTPCache.cc.

◆ set_max_size()

void libdap::HTTPCache::set_max_size ( unsigned long  size)

Cache size management. The default cache size is 20M. The minimum size is 5M in order not to get into weird problems while writing the cache. The size is indicated in Mega bytes. Note that reducing the size of the cache may trigger a garbage collection operation.

Note
The maximum cache size is UINT_MAX bytes (usually 4294967295 for 32-bit computers). If size is larger the value will be truncated to the value of that constant. It seems pretty unlikely that will happen given that the parameter is an unsigned long. This is a fix for bug 689 which was reported when the parameter type was signed.

This method locks the class' interface.

Parameters
sizeThe maximum size of the cache in megabytes.

Definition at line 724 of file HTTPCache.cc.

◆ update_response()

void libdap::HTTPCache::update_response ( const string &  url,
time_t  request_time,
const vector< string > &  headers 
)

Update the meta data for a response already in the cache. This method provides a way to merge response headers returned from a conditional GET request, for the given URL, with those already present.

This method locks the class' interface and the cache entry.

Parameters
urlUpdate the meta data for this cache entry.
request_timeThe time (Unix time, seconds since 1 Jan 1970) that the conditional request was made.
headersNew headers, one header per string, returned in the response.
Exceptions
ErrorThrown if the url is not in the cache.

Definition at line 1319 of file HTTPCache.cc.


The documentation for this class was generated from the following files: