TechEd NA 2014 – Public Cloud Security

TechEd North America 2014, Houston
Public Cloud Security: Surviving in a Hostile Multitenant Environment – Mark Russinovich

Day 3, 14 May 2014, 3:15PM-4:30PM (DCIM-B306)

Disclaimer: This post contains my own thoughts and notes based on attending TechEd North America 2014 presentations. Some content maps directly to what was originally presented. Other content is paraphrased or represents my own thoughts and opinions and should not be construed as reflecting the opinion of either Microsoft, the presenters or the speakers.

Executive Summary—Sean’s takeaways

  • To move to cloud, customers must trust us
  • Need to follow best practices to make things secure
    • At least as good as what your customers are doing
  • Makes sense to look at top threats and think about mitigating risk in each case
  • Azure does a lot of work to mitigate risk in many areas
    • Often far more than you’d do in your own organization
  • Top three threats
    • Data breach
    • Data loss
    • Account or service hijacking
  • Encryption at rest not a panacea

Full video

Mark Russinovich – Technical Fellow, Azure, Microsoft

“There is no cloud without trust”

  • Security, Availability, Reliability

Misconceptions about what it means to be secure in cloud

  • Will dispel some of the myths
  • Look at what’s behind some of the risks
  • Mitigation of risks

The Third Computing Era

  • 1st – Mainframes
  • 2nd – PCs and Servers
  • 3rd – Cloud + Mobile
  • (Lack of) Security could ruin everything

Security

  • Study after study, CIOs say looking at cloud, but worried about security
  • Other concerns
    • Security
    • Compliance
    • Loss of control

Goals of this Session

  • Identify threats
  • Discuss risk
  • Mitigate

Cloud Architecture

  • Canonical reference architecture
  • Virtualized structure
  • Datacenter facility
  • Microsoft—deployment people and DevOps
  • Customers of cloud—Enterprise, Consumer
  • Attacker

Cloud Security Alliance

  • Microsoft is a member

The Cloud Security Alliance “Notorious Nine” (what are threats to data in cloud?)

  • Periodically surveys industry
  • 2010 – Seven top threats
  • 2013 – Nine top threats
  • Mark adds 10th threat

#10 – Shared Technology Issues: Exposed Software

  • Shared code defines surface area exposed to customers
    • In public cloud, servers are homogeneous—exact same firmware
    • Hypervisor
    • Web server
    • API support libraries
  • What if there’s a vulnerability?
  • Stability and security are balanced against each other
    • Patching might bring down servers
  • Assumes infrastructure is accessible only by trusted actors
  • Corporate and legal mechanisms for dealing with attackers
  • This is: Enterprise Multi-tenancy

#10 – Shared Technology Issues: The Cloud Risk

  • A vulnerability in publically accessible software enables attached to puncture the cloud
    • Exposes data of other customers
    • Single incident—catastrophic loss of customer confidence
    • Potential attackers are anonymous and in diverse jurisdictions
  • “Are you doing as good a job as I’d be doing if I had the data in the house”?
  • Important (vs. Critical) – data not at risk, but confidence in Azure is critical
    • “Cloud critical”
  • “Hostile Multi-tenancy”
  • We do whatever it takes to patch immediately

#10 – Shared Technology Issues: Bottom Line

  • Enterprises and clouds exposed to this risk
  • Clouds at higher risk
    • Data from lots of customers
    • API surface is easy to get to
  • Clouds are generally better at response
    • Azure has about 1,000,000 servers
    • Can do critical patch in just a couple hours, all servers
    • Breach detection/mitigation
  • Risk matrix
    • Perceived risk—bit below average
    • Actual risk – Fairly high (Mark’s assessment)

#9 – Insufficient Due Diligence

  • Moving to cloud, but side-stepping IT processes
    • Shadow IT
    • BYOIT – Bring your own IT—non-IT going to cloud
    • IT management, etc. are designed for on-premises servers
  • Bottom line
    • IT must lead responsible action

#9 – Insufficient Due Diligence – Azure

  • Azure API Discovery
    • Monitors access to cloud from each device
  • SDL
  • Cloud SDL (under development)

#8 – Abuse of Cloud Services

  • Agility and scale of cloud is attractive to users
  • Use of Compute as malware platform
  • Use of storage to store and distributes illegal content
  • Use of compute to mine digital currency
    • VMs shut down per month, due to illegal activity: 50,000-70,000
    • Bulk of it is for generating crypto currency
    • Top 3 countries that are doing this: Russia, Nigeria, Vietnam
    • Password for Vietnamese pirate: mauth123 (password123)
    • Harvard supercomputer was mining bitcoin

#8 – Abuse of Cloud Services: It’s Happening

  • Attackers can use cloud and remain anonymous
  • Bottom line
    • Mostly cloud provider problem
    • Hurts bottom line, drives up prices
  • Using machine learning to learn how attackers are working

#7 – Malicious Insiders

  • Many cloud service provider employees have access to cloud
  • Malicious check-in, immediately rolls out to everybody
  • Operators that deploy code
  • Datacenter operations personnel
  • Mitigations
    • Employee background checks
    • Limited as-needed access to production
      • No standing admin privileges
    • Controlled/monitored access to production services
  • Bottom line
    • Real risk is better understood by third-party audits

Compliance is #1 concern for companies already doing stuff in cloud

#7 – Malicious Insiders – Compliance

#6 – Denial of Service

  • Public cloud is public
  • Amazon was at one point brought down by DDOS
  • Your own app could get DDOS’d
  • Cloud outage – a form of DDOS
  • Redundant power from two different locations to each data center
  • Blipping power to data center results in major outage—several hours
  • Mitigations
    • Cloud providers invest heavily in DDOS prevention
    • Third party appliances that detect and divert traffic
    • We do this for our clients too
    • Large-scale DDOS, doesn’t catch smaller things
  • Geo-available cloud providers can provide resiliency
  • Azure
    • DDOS prevention
    • Geo-regions for failover

#5 – Insecure Interfaces and APIs

  • Cloud is new and rapidly evolving, so lots of new API surface
  • CSA – one of the biggest risks
  • Examples
    • Weak TLS crypto – DiagnosticMonitor.AllowInsecure….
    • Incomplete verification of encrypted content
  • Bottom line
    • Cloud providers must follow SDL
    • Customers should validate API behavior

#4 – Account or Service Traffic Hijacking

  • Account hijacking: unauthorized access to an account
  • Possible vectors
    • Weak passwords
    • Stolen passwords (e.g. Target breach)
    • Then you find that people use same password everywhere; so attacker can use on other services
  • Not specific to cloud
    • Cloud use may result in unmanaged credentials
    • Developers are provisioning apps, hard-coding passwords, publishing them
    • Lockboxes, “secret stores”
    • Back door—someone in DevOps gets phished, then brute force
  • Mitigations
    • Turned off unneeded endpoints
    • Strong passwords
    • Multifactor authentication
      • Entire world moving to multifactor
    • Breach detection
  • Azure
    • Anti-malware
    • IP ACLs (with static IP addresses)
    • Point-to-Site, Site-to-Site, ExpressRoute
    • Azure Active Directory MFA

#3 – Data Loss

  • Ways to lose data
    • Customer accidentally deletes data
    • Attacker deletes or modifies it
    • Cloud provider accidentally deletes or modifies it
    • Natural disaster
  • Mitigations
    • Customer: do point-in-time backups
    • Customer: geo-redundant storage
    • Cloud provider: deleted resource tombstoning
      • Can’t permanently delete
      • 90 days
  • Azure
    • Globally Replicated Storage
    • VM Capture
    • Storage snapshots
    • Azure Site Replica

#2 – Data Breaches

  • Represents collection of threats
  • Most important asset of company is the data

#2 – Data Breaches: Physical Attacks on Media

  • Threat: Attacker has physical access to data/disk
  • Mitigation: cloud provider physical controls
    • To get in data center, gate with guards
    • To get into room with servers, biometric controls
    • Disk leaving data center—very strict controls
    • Data scrubbing and certificate
    • SSDs never leave data center, because it’s so hard to scrub it
    • HDDs are scrubbed
  • Enhanced mitigations
    • Third-party certifications (e.g. FedRamp)
    • Encryption at rest
  • Azure: third-party encryption

Encryption at rest

  • Two types
    • Cloud provider has keys
    • Customer has keys
  • When you have keys, you’re also giving keys to cloud to decrypt

#2 – Data Breaches: Physical Attacks on Data Transfer

  • Man-in-the-middle
  • Mitigation
    • Encrypt data between data centers
    • APIs use TLS
    • Customer uses TLS
    • Customer encrypts outside of cloud

#2 – Data Breaches: Side-Channel Attacks

  • Threat: Collocated attacker can infer secrets from processor side-effects
  • Snooping on processor that they’re co-located on
  • Researcher assumptions (but unlikely)
    • Attacker knows crypto code customer is using and key strength
    • Attacker can collocate on same server
    • Attacker shares same core as you
    • Customer VM continuously executes crypto code
  • Not very likely
  • Bottom line
    • Not currently a risk, in practice

#2 – Data Breaches: Logical Attack on Storage

  • Threat: attacker gains logical access to data
  • Mitigations
    • Defense-in-depth prevention
    • Monitoring/auditing
  • Encryption-at-rest not a significant mitigation
    • If they can breach logical access, they can maybe get keys too
    • The keys are there in the cloud
    • Encrypt-at-rest isn’t based on real threat modeling

#2 – Data Breaches: Bottom Line

  • Media breach not significant risk
  • Network breach is risk
  • Logical breach is a risk
    • Encrypt-at-rest doesn’t buy much

#1 – Self—Awareness

  • E.g. Skynet
  • People are actually worried about this

Session – Services Symposium: Enterprise Grade Cloud Applications

PDC 2008, Day #4, Session #2, 1 hr 30 mins

Eugenio Pace

My second session on Thursday was a continuation of the cloud services symposium from the first session.  There was a third part to the symposium, which I did not attend.

The presenter for this session, Eugenio, was not nearly as good a presenter as Gianpaolo from the previous session.  So it was a bit less dynamic, and harder to stay interested.

The session basically consisted of a single demo, which illustrated some of the possible solutions to the Identity, Monitoring, and Integration challenges mentioned in the previous session.

Identity

Eugenio pointed out the problems involved in authentication/authorization.  You don’t want to require the enterprise users to have a unique username/password combination for each service that they use.  And pushing out the enterprise (e.g. Active Directory) credential information to the third party service is not secure and creates a management nightmare.

The proposed solution is to use a central (federated) identity system to do the authentication and authorization.  This is the purpose of the Azure Access Control service.

Management

The next part of the demo showed how Azure supports remote management, on the part of IT staff at an individual customer site, of their instance of your application.  The basic things that you can do remotely include:

  • Active real-time monitoring of application health
  • Trigger administrative actions, based on the current state

The end result (and goal) is that you have the same scope of control over your application as you’d have if it were on premises.

Application Integration

Finally, Eugenio did some demos related to “process integration”—allowing your service to be called from a legacy service or system.  This demo actually woke everyone up, because Eugenio brought an archaic green-screen AS400 system up in an emulator and proceeded to have it talk to his Azure service.

Takeaways

The conclusions were recommendations to both IT organizations and ISVs:

  • Enterprise IT organization
    • Don’t settle for sub-optimal solutions
    • Tap into the benefits of Software+Services
  • ISV
    • Don’t give them an excuse to reject your solution
    • Make use of better tools, frameworks, and services

Session – Services Symposium: Expanding Applications to the Cloud

PDC 2008, Day #4, Session #1, 1 hr 30 mins

Gianpaolo Carraro

As the last day of PDC starts, I’m down to four sessions to go.  I’ll continue doing a quick blog post on each session, where I share my notes, as well as some miscellaneous thoughts.

The Idea of a Symposium

Gianpaolo started out by explaining that they were finishing PDC by doing a pair of symposiums, each a series of three different sessions.  One symposium focused on parallel computing and the other on cloud-based services.  This particular session was the first in the set of three that addressed cloud services.

The idea of a symposium, explained Gianpaolo, is to take all of the various individual technologies and try to sort of fit the puzzle pieces together, providing a basic context.

The goal was also present some of the experience that Microsoft has gained in early usage of the Azure platform over the past 6-12 months.  He said that he himself has spent the last 6-12 months using the new Services, so he had some thoughts to share.

This first session in the symposium focused on taking existing business applications and expanding them to “the cloud”.  When should an ISV do this?  Why?  How?

Build vs. Buy and On-Premises vs. Cloud

Gianpaolo presented a nice matrix showing the two basic independent decisions that you face when looking for software to fulfill a need.

  • Build vs. Buy – Can I buy a packaged off-the-shelf product that does what I need?  Or are my needs specialized enough that I need to build my own stuff?
  • On-Premises vs. Cloud – Should I run this software on my own servers?  Or host everything up in “the cloud”?

There are, of course, tradeoffs on both sides of each decision.  These have been discussed ad infinitum elsewhere, but the basic tradeoffs are:

  • Build vs. Buy – Features vs. Cost
  • On-Premises vs. Cloud – Control vs. Economy of Scale

Here’s the graph that Gianpaolo presented, showing six different classes of software, based on how you answer these questions.  Note that on the On-Premises vs. Cloud scale, there is a middle column that represents taking applications that you essentially control and moving them to co-located servers.

This is a nice way to look at things.  It shows that, for each individual software function, it can live anywhere on this graph.  In fact, Gianpaolo’s main point is that you can deploy different pieces of your solution at different spots on the graph.

So the idea is that while you might start off on-premises, you can push your solution out to either a co-located hosting server or to the cloud in general.  This is true of both packaged apps as well as custom-developed software.

Challenges

The main challenge in moving things out of the enterprise is dealing with the various issues that show up now when your data needs to cross the corporate/internet boundary.

There are several separate types of challenges that show up:

  • Identify challenges – as you move across various boundaries, how does the software know who you are and what you’re allowed to access?
  • Monitoring and Management challenges – how do you know if your application is healthy, if it’s running out in the cloud?
  • Application Integration challenge – how do various applications communicate with each other, across the various boundaries?

Solutions to the Identity Problem

Gianpaolo proposed the following possible solutions to this problem of identity moving across the different boundaries:

  • Federated ID
  • Claim-based access control
  • Geneva identity system, or Cardspace

The basic idea was that Microsoft has various assets that can help with this problem.

Solutions to the Monitoring and Management Problem

Next, the possible solutions to the monitoring and management problem included:

  • Programmatic access to a “Health” model
  • Various management APIs
  • Firewall-friendly protocols
  • Powershell support

Solutions to the Application Integration Problem

Finally, some of the proposed solutions to the application integration problem included:

  • ServiceBus
  • Oslo
  • Azure storage
  • Sync framework

The ISV Perspective

The above issues were all from an IT perspective.  But you can look at the same landscape from the perspective of an independent software vendor, trying to sell solutions to the enterprise.

To start with, there are two fundamentally different ways that the ISV can make use of “the cloud”:

  • As a service mechanism, for delivering your services via the cloud
    • You make your application’s basic services available over the internet, no matter where it is hosted
    • This is mostly a customer choice, based on where they want to deploy
  • As a platform
    • Treating the cloud as a platform, where your app runs
    • Benefits are the economy of scale
    • Mostly an ISV choice
    • E.g. you could use Azure without your customer even being aware of it

When delivering your software as a service, you need to consider things like:

  • Is the feature set available via cloud sufficient?
  • Firewall issues
  • Need a management interface for your customers

Some Patterns

Gianpaolo presented some miscellaneous design considerations and patterns that might apply to applications deployed in the cloud.

Cloudbursting

  • Design for average load, handling the ‘peak’ as an exception
  • I.e. only go to the cloud for scalability when you need to

Worker / Queue / Blob Pattern

Let’s say that you have a task like encoding and publishing of video.  You can push the data out to the cloud, where the encoding work happens.  (Raw data places in a “blob” in cloud storage).  You then add an entry to a queue, indicating that there is work to be done, and a separate worker process eventually does the encoding work.

This is a nice pattern for supporting flexible scaling—both the queues and the worker processes could be scaled out separately.

CAP: Pick 2 out of 3

  • Consistency
  • Availability
  • Tolerance to network Partitioning

Eventual Consistency (ACID – BASE)

The idea here is that we are all used to the ACID characteristics listed below.  We need to guarantee that the data is consistent and correct—which means that performance likely will suffer.  As an example, we have a process submit data synchronously because we need to guarantee that the data gets to its destination.

But Gianpaolo talked about the idea of “eventual consistency”.  For most applications, while it’s important for your data to be correct and consistent, it’s not necessarily for it to be consistent right now.  This leads to a model that he referred to as BASE, with the characteristics listed below.

  • ACID
    • Atomicity
    • Consistency
    • Isolation
    • Durability
  • BASE
    • Basically Available
    • Soft state
    • Eventually consistent

Fundamental Lesson

Basically the main takeaway is:

  • Put the software components in the place that makes the most sense, given their use

Session – A Lap Around Windows Azure, part 1

PDC 2008, Day #1, Session #1, 1 hr 15 min.

Manuvir Das

In my first PDC session, Manuvir Das gave us some more information about the Azure platform.

He started talking about the platform for traditional desktop clients as being the operating system.  It provides a variety of services that all applications need, to run on the platform.

Windows Azure serves as a similar platform for web-based applications.  In other words, Azure is the operating system for the cloud.  It provides the “glue” required for cloud-based services.

According to Manuvir, the four main features of Azure as a platform are:

  • Automated service management
  • Power service hosting environment
  • Scalable, available cloud storage
  • Rich, familiar developer experience

Manuvir demonstrated how an Azure-based service is defined.  The developer describes the service topology and health constraints.  And Azure automatically deploys the service in the cloud.  This means that your service runs on Microsoft’s servers and is scaled out (run on more than one server) as necessary.

As a sample topology, Manuvir showed a service application consisting of a web-facing “role” and a worker “role”.  The web-facing part of the application accepted requests from the web UI and the worker role processed the requests.  When the author configures the service, he specifies the desired number of instances for each of these roles.  Azure seamlessly creates a separate VM for each instance of each role.

A developer can create and test all of this infrastructure entirely on a local desktop by using the Azure SDK.  The SDK simulates the cloud environment on the developer’s PC.  This is valuable for uncovering concurrency and other issues prior to deployment.

Manuvir also talked about Azure providing scalable and available cloud-based storage.  The data elements used for storage are things like Blobs, Tables, and Queues.  Future revs of the platform will support other data objects, including File Streams, Caches, and Locks.  Note that these data structures are simpler than a true relational data source.  If a relational data store is required, the service should use something like SQL Services—Microsoft’s “SQL Server in the cloud” solution.

Rollout

Windows Azure goes “live” this week, for a limited number of users.  It will likely release as a commercial service sometime in 2009.

The business model that Azure will follow will be some sort of pay-for-use model.  Manuvir said that it would be competitive with other similar services.

At some point, Azure will better support geo-distribution of Azure-based services.  It’s also expected to eventually support the ability to run native (x86) code on the virtual servers.

Demo

The obligatory demo by a partner featured Danny Kim, from Full Armor.  Danny described an application that they had created for the government of Ethiopia, that allowed them to track thousands of teachers throughout the country and push out data/documents to the teachers’ laptops.

This demo was again a bit more interesting as a RIA demo than a demo of Azure’s capabilities.  The only Azure-related portion of the demo involved Danny explaining that they were currently supporting 150 teachers, but would soon scale up to several hundred thousand teachers.  But they hadn’t yet scaled out, so Danny’s application didn’t really highlight Azure’s capabilities.