PDC 2008, Day #3, Session #3, 1 hr 15 mins
Pablo Castro, Niranjan Nilakantan
Pablo and Niranjan did a session that went into some more detail on how the Azure Table objects can be used store data in the cloud.
Context
This talk dealt with the “Scalable Storage” part of the new Azure Services platform. Scalable Storage is a mechanism by which applications can store data “in the cloud” in a highly scalable manner.
Data Types
There are three fundamental data types available to applications using Azure Storage Services:
- Blobs
- Tables
- Queues
This session focused mainly on Tables. Specifically, Niranjan and Pablo addressed the different ways that an application might access the storage service programmatically.
Tables
Tables are a “massively scalable” data type for cloud-based storage. They are able to store billions of rows, are highly available, and “durable”. The Azure platform takes care of scaling out the data automatically to multiple servers, if necessary. (With some hints on the part of the developer).
Programming Model
Azure Storage Services are accessed through the ADO.NET Data Services (Astoria). Using ADO.NET Data Sercices, there are basically two ways for an application to access the service.
- .NET API (System.Data.Services.Client)
- REST interface (using HTTP URIs directly)
Data Model
It’s important to note that Azure knows nothing about your data model. It does not store data in a relational database or access it via a relational model. Rather, you specify a Table that you’d like to store data in, along with a simple query expression for the data that you’d like to retrieve.
A Table represents a single Entity and is composed of a collection of rows. Each row is uniquely defined by a Row Key, which the developer specifies. Additionally, the developer specifies a Partition Key, which is used by Azure in knowing how to split the data across multiple servers.
Beyond the Record Key and Partition Key, the developer can add any other properties that she likes, up to a total of 255 properties. While the Record and Partition Keys must be string data types, the other properties support other data types.
Partitioning
Azure storage services are meant to be automatically scalable, meaning that the data will be automatically spread across multiple servers, as needed.
In order to know how to split up the data, Azure uses a developer-specified Partition Key, which is one of the properties of each record. (Think “field” or “column”).
The developer should pick a partition key that makes sense for his application. It’s important to remember two things:
- Querying for all data having a single value for a partition key is cheap
- Querying for data having multiple partition key values is more expensive
For example, if your application often retrieves data by date and shows data typically for a single day, then it would make sense to have a CurrentData property in your data entity and to make that property the Partition Key.
The way to think of this is that each possible unique value for a Partition Key represent a “bucket” that will contain one or more records. If you pick a key that results in only one record per bucket, that would be inefficient. But if you pick a key that results in a set of records in the bucket that you are likely to ask for together, this will be efficient.
Accessing the Data Programmatically
Pablo demonstrated creating the classes required to access data stored in an Azure storage service.
He started by creating a class representing the data entity to be stored in a single table. He selected and defined properties for the Partition and Record key, as well as other properties to store any other desired data in.
Pablo also recommended that you create a single class to act as an entry point into the system. This class then acts as a service entry point for all of the data operations that your client application would like to perform.
He also demonstrated using LINQ to run queries against the Azure storage service. LINQ automatically created to corresponding URI to retrieve, create, update, or delete the data.
Miscellaneous
Pablo and Niranjan also touched on a few other issues that most applications will deal with:
- Dealing with concurrent updates (uses Etag and if-match)
- Pagination (using continuation tokens)
- Using Azure Queues for pseudo-transactional deletion of data
Takeaways
Pablo and Niranjan demonstrated that it was quite straightforward to access Azure storage services from a .NET application. It’s also the case that non-.NET stacks could make use of the same services using a simple REST protocol.
It was also helpful to see how Pablo used ADO.NET Data Services to construct a service layer on top of the Azure storage services. This seems to make consuming the data pretty straightforward.
(I still might have this a little confused—it’s possible that Astoria was just being used to wrap Azure services, rather than exposing the data in an Astoria-based service to the client. I need to look at the examples in a little more detail to figure this out).
Original Materials
You can find the video of the session at: http://mschnlnine.vo.llnwd.net/d1/pdc08/WMV-HQ/ES07.wmv
Pingback: Session – Offline-Enabled Data Services « Sean’s Stuff