Sean’s Stuff

Various musings on software development and technology.

Session – Services Symposium: Enterprise Grade Cloud Applications

Posted by Sean on 1 November, 2008

PDC 2008, Day #4, Session #2, 1 hr 30 mins

Eugenio Pace

My second session on Thursday was a continuation of the cloud services symposium from the first session.  There was a third part to the symposium, which I did not attend.

The presenter for this session, Eugenio, was not nearly as good a presenter as Gianpaolo from the previous session.  So it was a bit less dynamic, and harder to stay interested.

The session basically consisted of a single demo, which illustrated some of the possible solutions to the Identity, Monitoring, and Integration challenges mentioned in the previous session.

Identity

Eugenio pointed out the problems involved in authentication/authorization.  You don’t want to require the enterprise users to have a unique username/password combination for each service that they use.  And pushing out the enterprise (e.g. Active Directory) credential information to the third party service is not secure and creates a management nightmare.

The proposed solution is to use a central (federated) identity system to do the authentication and authorization.  This is the purpose of the Azure Access Control service.

Management

The next part of the demo showed how Azure supports remote management, on the part of IT staff at an individual customer site, of their instance of your application.  The basic things that you can do remotely include:

  • Active real-time monitoring of application health
  • Trigger administrative actions, based on the current state

The end result (and goal) is that you have the same scope of control over your application as you’d have if it were on premises.

Application Integration

Finally, Eugenio did some demos related to “process integration”—allowing your service to be called from a legacy service or system.  This demo actually woke everyone up, because Eugenio brought an archaic green-screen AS400 system up in an emulator and proceeded to have it talk to his Azure service.

Takeaways

The conclusions were recommendations to both IT organizations and ISVs:

  • Enterprise IT organization
    • Don’t settle for sub-optimal solutions
    • Tap into the benefits of Software+Services
  • ISV
    • Don’t give them an excuse to reject your solution
    • Make use of better tools, frameworks, and services

Posted in Azure, PDC 2008 | Tagged: , , , , , , , , | No Comments »

Session – Services Symposium: Expanding Applications to the Cloud

Posted by Sean on 1 November, 2008

PDC 2008, Day #4, Session #1, 1 hr 30 mins

Gianpaolo Carraro

As the last day of PDC starts, I’m down to four sessions to go.  I’ll continue doing a quick blog post on each session, where I share my notes, as well as some miscellaneous thoughts.

The Idea of a Symposium

Gianpaolo started out by explaining that they were finishing PDC by doing a pair of symposiums, each a series of three different sessions.  One symposium focused on parallel computing and the other on cloud-based services.  This particular session was the first in the set of three that addressed cloud services.

The idea of a symposium, explained Gianpaolo, is to take all of the various individual technologies and try to sort of fit the puzzle pieces together, providing a basic context.

The goal was also present some of the experience that Microsoft has gained in early usage of the Azure platform over the past 6-12 months.  He said that he himself has spent the last 6-12 months using the new Services, so he had some thoughts to share.

This first session in the symposium focused on taking existing business applications and expanding them to “the cloud”.  When should an ISV do this?  Why?  How?

Build vs. Buy and On-Premises vs. Cloud

Gianpaolo presented a nice matrix showing the two basic independent decisions that you face when looking for software to fulfill a need.

  • Build vs. Buy – Can I buy a packaged off-the-shelf product that does what I need?  Or are my needs specialized enough that I need to build my own stuff?
  • On-Premises vs. Cloud – Should I run this software on my own servers?  Or host everything up in “the cloud”?

There are, of course, tradeoffs on both sides of each decision.  These have been discussed ad infinitum elsewhere, but the basic tradeoffs are:

  • Build vs. Buy – Features vs. Cost
  • On-Premises vs. Cloud – Control vs. Economy of Scale

Here’s the graph that Gianpaolo presented, showing six different classes of software, based on how you answer these questions.  Note that on the On-Premises vs. Cloud scale, there is a middle column that represents taking applications that you essentially control and moving them to co-located servers.

This is a nice way to look at things.  It shows that, for each individual software function, it can live anywhere on this graph.  In fact, Gianpaolo’s main point is that you can deploy different pieces of your solution at different spots on the graph.

So the idea is that while you might start off on-premises, you can push your solution out to either a co-located hosting server or to the cloud in general.  This is true of both packaged apps as well as custom-developed software.

Challenges

The main challenge in moving things out of the enterprise is dealing with the various issues that show up now when your data needs to cross the corporate/internet boundary.

There are several separate types of challenges that show up:

  • Identify challenges – as you move across various boundaries, how does the software know who you are and what you’re allowed to access?
  • Monitoring and Management challenges – how do you know if your application is healthy, if it’s running out in the cloud?
  • Application Integration challenge – how do various applications communicate with each other, across the various boundaries?

Solutions to the Identity Problem

Gianpaolo proposed the following possible solutions to this problem of identity moving across the different boundaries:

  • Federated ID
  • Claim-based access control
  • Geneva identity system, or Cardspace

The basic idea was that Microsoft has various assets that can help with this problem.

Solutions to the Monitoring and Management Problem

Next, the possible solutions to the monitoring and management problem included:

  • Programmatic access to a “Health” model
  • Various management APIs
  • Firewall-friendly protocols
  • Powershell support

Solutions to the Application Integration Problem

Finally, some of the proposed solutions to the application integration problem included:

  • ServiceBus
  • Oslo
  • Azure storage
  • Sync framework

The ISV Perspective

The above issues were all from an IT perspective.  But you can look at the same landscape from the perspective of an independent software vendor, trying to sell solutions to the enterprise.

To start with, there are two fundamentally different ways that the ISV can make use of “the cloud”:

  • As a service mechanism, for delivering your services via the cloud
    • You make your application’s basic services available over the internet, no matter where it is hosted
    • This is mostly a customer choice, based on where they want to deploy
  • As a platform
    • Treating the cloud as a platform, where your app runs
    • Benefits are the economy of scale
    • Mostly an ISV choice
    • E.g. you could use Azure without your customer even being aware of it

When delivering your software as a service, you need to consider things like:

  • Is the feature set available via cloud sufficient?
  • Firewall issues
  • Need a management interface for your customers

Some Patterns

Gianpaolo presented some miscellaneous design considerations and patterns that might apply to applications deployed in the cloud.

Cloudbursting

  • Design for average load, handling the ‘peak’ as an exception
  • I.e. only go to the cloud for scalability when you need to

Worker / Queue / Blob Pattern

Let’s say that you have a task like encoding and publishing of video.  You can push the data out to the cloud, where the encoding work happens.  (Raw data places in a “blob” in cloud storage).  You then add an entry to a queue, indicating that there is work to be done, and a separate worker process eventually does the encoding work.

This is a nice pattern for supporting flexible scaling—both the queues and the worker processes could be scaled out separately.

CAP: Pick 2 out of 3

  • Consistency
  • Availability
  • Tolerance to network Partitioning

Eventual Consistency (ACID – BASE)

The idea here is that we are all used to the ACID characteristics listed below.  We need to guarantee that the data is consistent and correct—which means that performance likely will suffer.  As an example, we have a process submit data synchronously because we need to guarantee that the data gets to its destination.

But Gianpaolo talked about the idea of “eventual consistency”.  For most applications, while it’s important for your data to be correct and consistent, it’s not necessarily for it to be consistent right now.  This leads to a model that he referred to as BASE, with the characteristics listed below.

  • ACID
    • Atomicity
    • Consistency
    • Isolation
    • Durability
  • BASE
    • Basically Available
    • Soft state
    • Eventually consistent

Fundamental Lesson

Basically the main takeaway is:

  • Put the software components in the place that makes the most sense, given their use

Posted in Azure, PDC 2008 | Tagged: , , , , , , , , | No Comments »

Session – Building Mesh-Enabled Web Applications Using the Live Framework

Posted by Sean on 31 October, 2008

PDC 2008, Day #3, Session #5, 1 hr 15 mins

Arash Ghanaie-Sichanie

Throughout the conference, I bounced back and forth between Azure and Live Mesh sessions.  I was trying to make sense of the difference between them and understand when you might use one vs. the other.

I understand that Azure Services is the underlying platform that Live Services, and Live Mesh, are built on.  But at the outset, it still wasn’t quite clear what class of applications Live Services were targeted at.  Which apps would want to use Live Services and which would need to drop down and use Azure?

I think that after Arash’s talk, I have a working stab at answering this question.  This is sort of my current understanding, that I’ll update as things become more clear.

Question: When should I use Live Services / Live Mesh ?

Answer:

  • Create a Live Mesh application (Mesh-enabled) if your customers are currently, or will become, live.com customers.
  • Otherwise, use Azure cloud-based services outside of Live, or the other services built on top of Azure:
    • Azure storage services
    • Sync Framework
    • Service Bus
    • SQL Data Services

Unless I’m missing something, you can’t make use of features of the Live Operating Environment, either as a web application or a local desktop client, unless your application is registered with Live.com, and your user has added your application to his account.

The one possible loophole that I can see is that you might just have your app always authorize, for all users, using a central account.  That account could be pre-created and pre-configured.  Your application would then use the same account for all users, enabling some of the synchronization and other options.  But even this approach might not work—it’s possible that any device where local Mesh data is to be stored needs to be registered with that Live account and so your users wouldn’t normally have the authorization to join their devices to your central mesh.

Three Takeaways

Arash listed three main takeaways from this talk:

  • Live Services add value to different types of applications
  • Mesh-enabled web apps extend web sites to the desktop
  • The Live Framework is a standards-based API, available to all types of apps

How It All Works

The basic idea is to start with a “Mesh-enabled” web site.  This is a web site that delivers content from the user’s Mesh, including things like contacts and files.  Additionally, the web application could store all of its data in the Mesh, rather than on the web server where it is hosted.

Once a web application is Mesh-enabled, you have the ability to run it on various devices.  You basically create a local client, targeted at a particular platform, and have it work with data through the Live Framework API.  It typically ends up working with a cached local copy of the data, and the data is automatically synchronized to the cloud and then to other devices that are running the same application.

This basically implements the Mesh vision that Ray Ozzie presented first at Mix 2008 in Las Vegas and then again at PDC 2008 in Los Angeles.  The idea is that we move a user’s data into the cloud and then the data follows them around, no matter what device they’re currently working on.  The Mesh knows about a user’s:

  • Devices
  • Data, including special data like Contacts
  • Applications
  • Social graph  (friends)

The User’s Perspective

The user must do a few things to get your application or web site pointing at his data.  As a developer, you send him a link that lets him go sign up for a Live.com account and then register your application in his Mesh.  Registration, from the user’s perspective, is a way for him to authorize your application to access his Mesh-based data.

Again, it’s required that the user has, or get, a Live.com account.  That’s sort of the whole idea—we’re talking about developing applications that run on the Live platform.

Run Anywhere

There is still work to be done, on the part of the developer, to be able to run the application on various devices, but the basic choices are:

  • Locally, on a client PC or Mac
  • In a web browser, anywhere
  • From the Live Desktop itself, which is hosted in a browser

(In the future, with Silverlight adding support for running Silverlight-sandboxed apps outside of the browser, we can imagine that as a fourth option for Mesh-enabled applications).

The Developer’s Perspective

In addition to building the mesh application, the developer must also register his application, using the Azure Services Portal.  Under the covers, the application is just an Azure-based service.  And so you can leverage the Azure standard goodies, like running your service in a test mode and then deploying/publishing it.

One other feature that is made available to Mesh application authors is the ability to view basic Analytics data for your application.  Because the underlying Mesh service is aware of your application, wherever it is running, it can collect data about usage.    The data is “anonymized”, so you can’t see data about individual users, but can view general metrics.

Arash talked in some detail about the underlying security model, showing how a token is granted to the user/application.

Conclusions

Mesh obviously seems a good fit for some applications.  You can basically run on a platform that gives you a lot of services, as well as access to useful data that may already exist in a particular user’s Mesh environment.  There is also some potential for cross-application sharing of data, once common data models are agreed upon.

But the choice to develop on the Mesh platform implies a decision to sign your users up as part of the Mesh ecosystem.  While the programming APIs are entirely open, using basic HTTP/REST protocols, the platform itself is owned/hosted/run exclusively by Microsoft.

Not all of your users will want to go through the hassle of setting up a Live account in order to use your application.  What makes it a little worse is that the process for them is far from seamless.  It would be easier if you could hide the Live branding and automate some of the configuration.  But the user must actually log into Live.com and authorize your application.  This is a pretty high barrier, in terms of usability, for some users.

This also means that your app is betting on the success of the Live.com platform itself.  If the platform doesn’t become widely adopted, few users will live (pardon the pun) in that environment.  And the value of hosting your application in that ecosystem becomes less clear.

It remains to be seen where Live as a platform will go.  The tools and the programming models are rich and compelling.  But whether the platform will live up to Ozzie’s vision is still unclear.

Posted in Mesh, PDC 2008 | Tagged: , , , , , , | No Comments »

Session – Windows Azure Tables: Programming Cloud Table Storage

Posted by Sean on 31 October, 2008

PDC 2008, Day #3, Session #3, 1 hr 15 mins

Pablo Castro, Niranjan Nilakantan

Pablo and Niranjan did a session that went into some more detail on how the Azure Table objects can be used store data in the cloud.

Context

This talk dealt with the “Scalable Storage” part of the new Azure Services platform.  Scalable Storage is a mechanism by which applications can store data “in the cloud” in a highly scalable manner.

Data Types

There are three fundamental data types available to applications using Azure Storage Services:

  • Blobs
  • Tables
  • Queues

This session focused mainly on Tables.  Specifically, Niranjan and Pablo addressed the different ways that an application might access the storage service programmatically.

Tables

Tables are a “massively scalable” data type for cloud-based storage.  They are able to store billions of rows, are highly available, and “durable”.  The Azure platform takes care of scaling out the data automatically to multiple servers, if necessary.  (With some hints on the part of the developer).

Programming Model

Azure Storage Services are accessed through the ADO.NET Data Services (Astoria).  Using ADO.NET Data Sercices, there are basically two ways for an application to access the service.

  • .NET API   (System.Data.Services.Client)
  • REST interface   (using HTTP URIs directly)

Data Model

It’s important to note that Azure knows nothing about your data model.  It does not store data in a relational database or access it via a relational model.  Rather, you specify a Table that you’d like to store data in, along with a simple query expression for the data that you’d like to retrieve.

A Table represents a single Entity and is composed of a collection of rows.  Each row is uniquely defined by a Row Key, which the developer specifies.  Additionally, the developer specifies a Partition Key, which is used by Azure in knowing how to split the data across multiple servers.

Beyond the Record Key and Partition Key, the developer can add any other properties that she likes, up to a total of 255 properties.  While the Record and Partition Keys must be string data types, the other properties support other data types.

Partitioning

Azure storage services are meant to be automatically scalable, meaning that the data will be automatically spread across multiple servers, as needed.

In order to know how to split up the data, Azure uses a developer-specified Partition Key, which is one of the properties of each record.  (Think “field” or “column”).

The developer should pick a partition key that makes sense for his application.  It’s important to remember two things:

  • Querying for all data having a single value for a partition key is cheap
  • Querying for data having multiple partition key values is more expensive

For example, if your application often retrieves data by date and shows data typically for a single day, then it would make sense to have a CurrentData property in your data entity and to make that property the Partition Key.

The way to think of this is that each possible unique value for a Partition Key represent a “bucket” that will contain one or more records.  If you pick a key that results in only one record per bucket, that would be inefficient.  But if you pick a key that results in a set of records in the bucket that you are likely to ask for together, this will be efficient.

Accessing the Data Programmatically

Pablo demonstrated creating the classes required to access data stored in an Azure storage service.

He started by creating a class representing the data entity to be stored in a single table.  He selected and defined properties for the Partition and Record key, as well as other properties to store any other desired data in.

Pablo also recommended that you create a single class to act as an entry point into the system.  This class then acts as a service entry point for all of the data operations that your client application would like to perform.

He also demonstrated using LINQ to run queries against the Azure storage service.  LINQ automatically created to corresponding URI to retrieve, create, update, or delete the data.

Miscellaneous

Pablo and Niranjan also touched on a few other issues that most applications will deal with:

  • Dealing with concurrent updates  (uses Etag and if-match)
  • Pagination  (using continuation tokens)
  • Using Azure Queues for pseudo-transactional deletion of data

Takeaways

Pablo and Niranjan demonstrated that it was quite straightforward to access Azure storage services from a .NET application.  It’s also the case that non-.NET stacks could make use of the same services using a simple REST protocol.

It was also helpful to see how Pablo used ADO.NET Data Services to construct a service layer on top of the Azure storage services.  This seems to make consuming the data pretty straightforward.

(I still might have this a little confused—it’s possible that Astoria was just being used to wrap Azure services, rather than exposing the data in an Astoria-based service to the client.  I need to look at the examples in a little more detail to figure this out).

Original Materials

You can find the video of the session at:  http://mschnlnine.vo.llnwd.net/d1/pdc08/WMV-HQ/ES07.wmv

Posted in Azure, PDC 2008 | Tagged: , , , , , , | 1 Comment »

Session – Live Services: Live Framework Programming Model Architecture and Insights

Posted by Sean on 30 October, 2008

PDC 2008, Day #3, Session #1, 1 hr 15 mins

Ori Amiga

My next session dug a bit deeper into the Live Framework and some of the architecture related to building a Live Mesh application.

Ori Amiga was presenting, filling in for Dharma Shukla (who just became a new Dad).

Terminology

It’s still a little unclear what terminology I should be using.  In some areas, Microsoft is switching from “Mesh” to just plain “Live”.  (E.g. Mesh Operating Environment is now Live Operating Environment).  And the framework that you use to build Mesh applications is the Live Framework.  But they are still very much talking about “Mesh”-enabled applications.

I think that the way to look at is this:

  • Azure is the lowest level of cloud infrastructure stuff
  • The Live Operating Environment runs on top of Azure and provides some basic services useful for cloud applications
  • Mesh applications run on top of the LOE and provide access to a live.com user’s “mesh”: devices, applications and data that lives inside their mesh

I think that this basically means that you could have an application that makes use of the various Live Services in the Live Operating Environment without actually being a Mesh application.  On the other hand, some of the services in the LOE don’t make any sense to non-Mesh apps.

Live Operating Environment  (LOE)

Ori reviewed the Live Operating Environment, which is the runtime that Mesh applications run on top of.  Here’s a diagram from Mary Jo Foley’s blog:

This diagram sort of supports my thought that access to a user’s mesh environment is different from the basic stuff provide in the LOE.  According to this particular view, Live Services are services that provide access to the “mesh stuff”, like their contact lists, information about their devices, the data stores (data stored in the mesh or out on the devices), and other applications in that user’s mesh.

The LOE would contain all of the other stuff—basic a set of utility classes, akin to the CLR for desktop-based applications.  (Oh wait, Azure is supposed to be “akin to the CLR”). *smile*

Ori talked about a list of services that live in the LOE, including:

  • Scripting engine
  • Formatters
  • Resource management
  • FSManager
  • Peer-to-peer communications
  • HTTP communications
  • Application engine
  • Apt(?) throttle
  • Authentication/Authorization
  • Notifications
  • Device management

Here’s another view of the architecture (you can also find it here).

Also, for more information on the Live Framework, you can go here.

Data in the Mesh

Ori pointed out an important point about how Mesh applications access their data.  If you have a Mesh client running on your local PC, and you’ve set up its associated data store to synch between the cloud and that device, the application uses local data, rather than pulling data down from the cloud.  Because it’s working entirely with locally cached data, it can run faster than the corresponding web-based version (e.g. running in the Live Desktop).

Resource Scripts

Ori talked a lot about resource scripts and how they might be used by a Mesh-enabled application.  An application can perform actions in the Mesh using these resource scripts, rather than performing actions directly in the code.

The resource scripting language contains things like:

  • Control flow statements – sequence and interleaving, conditionals
  • Web operation statements – to issue HTTP POST/PUT/GET/DELETE
  • Synchronization statements – to initiate data synchronization
  • Data flow constructs – for binding statements to other statements(?)

Ori did a demo that showed off a basic script.  One of the most interesting things was how he combined sequential and interleaved statements.  The idea is that you specify what things you need to do in sequence (like getting a mesh object and then getting its children), and what things you can do in parallel (like getting a collection of separate resources).  The parallelism is automatically taken care of by the runtime.

Custom Data

Ori also talked quite a bit about how an application might view its data.  The easiest thing to do would be to simply invent your own schema and then be the only app that reads/writes the data in that schema.

A more open strategy, however, would be to create a data model that other applications could use.  Ori talked philosophically here, arguing that this openness serves to improve the ecosystem.  If you can come up with a custom data model that might be useful to other applications, they could be written to work with the same data that your application uses.

Ori demonstrated this idea of custom data in Mesh.  Basically you create a serializable class and then mark it up so that it gets stored as user data within a particular DataEntry.  (Remember: Mesh objects | Data feeds | Data entries).

This seems like an attractive idea, but it seems a bit clunky.  The custom data is embedded into the standard AtomPub stream, but not in a queryable way.  It looked more like it was jammed into an XML element in the <DataEntry> element.  This means that your custom data items would not be directly queryable.

Ori did go on to admit that custom data isn’t queryable or indexable, but really only for “lightweight data”.  This is really at odds with the philosophy of a reusable schema for other applications.

Tips & Tricks

Finally, Ori presented a handful of tips & tricks for working with Mesh applications:

  • To clean out local data cache, just delete the DB/MR/Assembler directories and re-synch
  • Local metadata is actually sotred in SQL Server Express.  Go ahead and peek at it, but be careful not to mess it up.
  • Use the Resource Model Browser to really see what’s going on under the covers.  What it shows you represents the truth of what’s happening between the client and the cloud
  • One simple way to track synch progress is to just look at the size of the Assembler and MR directories
  • Collect logs and send to Microsoft when reporting a problem

Summary

Ori finished with the following summary:

  • Think of the cloud as just a special kind of device
  • There is a symmetric cloud/client programming model
  • Everything is a Resource

Posted in Mesh, PDC 2008 | Tagged: , , , , , , , , | No Comments »