Wednesday, August 5, 2009

Rackspace CloudFiles

CloudFiles is a cloud storage offering from Rackspace. You can use it for archiving and backing up your web-based storage data. The data is replicated across 3 locations giving you excellent redundancy.

I tried out their offering and here are some of my notes:

- Max file size of 5GB - most cloud storage vendors have these limits mostly due to protocol limitations
- Along with CloudFiles they provide Content Distribution Network(CDN) for distributing data across their data centers around the world.
- 15 cents/GB/month storage cost. 8 cents inbound and 22 cents outbound/GB
- APIs in multiple languages - PHP, Java, C#, Python, Ruby
- They have a "browser panel" which can be used to upload files and distribute using CDN. - Mozilla extension + Mac app to access CloudFiles storage is available. Not maintained by Rackspace though and *not* a backup tool, its just a front-end.
- Provides SSL for security
- Tokens expire every 24 hours and clients need to reconnect. Every operation must use a authentication token.
- FireUploader simulates a hierarchical file system structure which is not possible with CloudFiles.
- Cyberduck used on Mac is a GPL software and a good frontend for CloudFiles

- The only way user can access the content from another account is if they share their username/API access key or a session token.
- Files are called "objects" in Rackspace lingo. Data is saved as-is without compression/encryption. It does support metadata in form of key-value pairs so you can "tag" files and organize your data. Metadata is limited to 4KB and max of 90 individual tags can be stored.
- You can enable CDN on certain containers. Each CDN enabled container has a unique Uniform Resource Locator(URL). For example a CDN enabled container named "photos" can be referenced as - if this container has an image named "baby.jpg" then that image can be served through LimeLight Networks CDN with the URL of This is how we can enable SaaS applications to use data stored on our customer's accounts.
- Code snippets and documentation of APIs is good and code examples are available.
- The APIs are Restful i.e. Representational State Transfer protocol.

- You need to do a GET call on the account to get a list of all containers in the account. You can get this information in JSON or XML format as well.
- Max of 10000 container names can be returned at a time. You can make a continuation call with a "marker" if you want to retrieve later containers.
- HEAD call is used to find out number of containers in the account and the number of used bytes
- GET operation on a container is used to list objects in the container.
- Pseudo hierarchical folders/directories
- Users will be able to simulate a hierarchical structure in Cloud Files by following a few guidelines. Object names must contain the forward slash character ‘/’ as a path element separator and also create “directory marker” Objects, then they will be able to traverse this nested structure with the new “path” query parameter. This can best be illustrated by example: For the purposes of this example, the Container where the Objects reside is called “backups”. All Objects in this example start with a prefix of “photos” and should NOT be confused with the Container name.
In the example, the following “real” Objects are uploaded to the storage system with names representing their full filesystem path.


To take advantage of this feature, the “directory marker” Objects must also be created to represent the appropriate directories. The following additional Objects need to be created. A good convention would be to create these as zero or one byte files with a Content-Type of “application/directory”.

Now issuing a GET request against the Container name coupled with the “path” query parameter of the directory to list can traverse these “directories”. Only the request line and results are depicted below excluding other request/response headers.

GET /v1/AccountString/backups?path=photos HTTP/1.1


To traverse down into the “animals” directory, specify that path. GET /v1/AccountString/backups?path=photos/animals


By combining this “path” query parmater with the “format” query parameter, users will be able to easily distinguish between virtual folders/directories by Content-Type and build interfaces that allow traversal of the pseudo-nested structure.

- DELETE call on container will not succeed if it has objects
- HEAD operation on object is used to retrieve object metadata and standard HTTP headers
- GET call is used to used to retrieve object data. It supports headers like If-Match, If-None-Match, If-Modified-Since, If-Unmodified-Since. It is possible to get a range of bytes as well.
- PUT call is used to write or overwrite an object's metadata and content. End-to-end data integrity can be ensured by including an MD5 checksum of your objects data in the Etag header.
- Chunked requests can be sent if you do not know the size of the object you are PUT'ing but total size must be <5gb>

Short-comings mentioned by Rackspace themselves:
- cannot mount or map CloudFiles account as a network drive.
- files cannot be modified so block level changes cannot be done.
- containers cannot be nested
- No ACLs for security

cURL is a command line tool available on most UNIX environments and it allows you to transmit and receive HTTP requests and responses from the command-line or from within a shell script. So you can work with the ReST API directly instead of using client APIs. I used the cURL tool to test out the calls provided by CloudFiles.


edjy said...

Hi Kalpak,

Excellent summary of our Cloud Files system! I hope you enjoy using the service and let us know if we can help.


Anonymous said...