That Conference 2018 – Know What Your Code Is Doing to SQL Server

hat Conference 2018, Kalahari Resort, Lake Delton, WI
Know What Your Code Is Doing to SQL Server – Kevin Boles

Day 1, 6 Aug 2018

Disclaimer: This post contains my own thoughts and notes based on attending That Conference 2018 presentations. Some content maps directly to what was originally presented. Other content is paraphrased or represents my own thoughts and opinions and should not be construed as reflecting the opinion of the speakers.

Kevin Boles
TheSQLGuru@gmail.com

Executive Summary

  • Handful of specific examples of ways that you can end up with poorly performing SQL Server queries
  • Improper use of Entity Framework sometimes to blame
  • Always pay attention to exact T-SQL that is being used

 

Miscellaneous

  • Recommend 192GB on server, go beyond max memory (128GB)
  • In memory OLTP now down in standard edition
  • SQL Server 2016–It Just Runs Faster
    • Good blog posts on performance improvements
  • Get on 2016 or 2017

 

Entity Framework is tool that simplifies, but lower performance

  • Can be very productive
  • Several traps, easy to fall into
  • Other ORMs have similar problem

 

Being too greedy with rows

  • EF exposes objects without knowing values
  • E.g. pull data out to list, when do Where clause on the list
    • Pulls entire table before filtering
  • Do filtering on SQL Server
    • Provided that you ave the proper filter
  • Or could bring back to IQueryable

 

N+1 Select Problem–minimize trips to DB

  • ANTS performance profiler
  • E.g. One query to parent entities, 2nd query for each child to get related entities
  • Round-trip for each child query
  • Lazy Loading
  • “N+1 select problem”
  • If you know you want child data ahead of time, do the original query to include it
  • Use Eager Loading
    • .Where, .Include
  • Make sure that you really need them
  • Don’t do aggregation in client–do it in SQL Server
  • E.g. if we’re just counting child objects, do it with aggregate function in SQL Server
  • Don’t iterate and don’t get data that you don’t need

 

Being too Greedy with Columns

  • E.g. pull all Pupils from school, then iterate to dump out certain stuff
  • Selecting all columns when you just need subset of columns
  • EF doesn’t know what columns you want
  • Causes two problems
    • More data than we need
    • Impossible to index, since we pull everything back
  • SELECT WITH NO LOCK
    • If acceptable, can be faster because you don’t wait for confurrent transactions
  • If you’re pulling all columns, you’re locking entire table for the length of that query
  • Select new to just get you want

 

Mismatched Data Types

  • If data types don’t match, even simple queries can perform poorly
  • E.g. search for entities with particular zip code
  • Dev system should match prodution system in terms of scale
  • Find query runs fast, EF runs very slowly
  • Query plan warning
    • CONVERT_IMPLICIT
    • Function on a column in a WHERE clause
  • Column in database is VARCHART, but .NET has string, which is UTF16 Unicode
  • SQL Server does
    • INDEX SCAN of every zip code
    • Big I/O cost
    • Then converts the data–CPU hit
    • Optimizer can’t do prediction because  data is different than what’s in the index
  • Solution
    • Edit model to tell EF to use VARCHAR, using column annotation
    • In code, attribute
  • Do not let EF create models, code-first

 

ORM ?

  • He recommends not using one

 

ADO.NET

  • 3 things to remember about stuff on internet about SQL Server
    • Who wrote it
    • When was it written
    • Does it apply to my system
  • ADO.NET Parameterized Queries
  • AddWithValue–get rid of this

 

Overly Generic Queries

  • Allowing searching on multiple fields
  • So we construct one query, where clauses for each search field
  • == null or matches
  • IS NULL OR stuff gets into query
  • Bad because it builds query plan just once for all the possible search combinations
  • Always runs same query plan, no matter what you search on
  • Can’t do a seek on index
  • Options
    • Stored procedure — Gail – “catch all queries”
    • Conditionals in EF side, exclude clauses
    • Make SQL Server recompile plans each time–from within EF. Sometimes good
    • Could write interceptor, option(recomiple)

 

Bloating the plan cache

  • Reuse of execution plans often a good thing, avoid regenerating plan
  • In order for a plan to be reused, the statement text must be textually identical, wihch as we just saw, is case for parameterized queries
    • Ad-hoc workloads

 

Skip / Take

  • OFFSET FETCH
  • Next time we run query, we get different query plan
  • Enable SQL Server setting ‘optimize for ad-hoc workloads ‘
    • Less aggressive at caching plans, generally a good thing
  • EF6, pass in lambdas
    • In SQL, values are paremetrized, so we don’t recreate cache plan

 

Inserting data

  • EF will run separate INSERT statements for every row being added
  • Not good if you have to insert a lot of data
  • Can use EF.BulkInsert or use EF 7 (has this out-of-the-box)

 

Code First, Performance Last

  • Never allow your code to create your storage constructs
  • (He means–code-first used to refresh database object)

 

Other Potential Issues

  • Improper use of IN clauses
    • >16 clauses in IN clause if bad
  • Correlated subqueries instead of JOINs
  • Parameterized queries with parameters defined to be length of data they contain
  • Chatty apps
  • Security conerns
  • Increased IO and app bloat due to asking for too mcuh data

 

String Properties in ADO.NET Entity Data Model

(NOTE: This is Model First development in Entity Framework–start with a data model in the Entity Framework Designer and then generate database schema from the model).

When you add a scalar property of type string to an entity data model, you can fill in the values of several other properties within Visual Studio.

  • Fixed Length – (None), True, False
  • Max Length – (None), value, Max
  • Unicode – (None), True, False

stringprop

What does a value of (None) mean for these properties (when generating SQL Server Database from model)?

  • Fixed Length – (None) equivalent to False
  • Max Length – (None) equivalent to Max
  • Unicode – (None) equivalent to True

Here’s how choices for these properties map to SQL Server data types:

  • Fixed=True, Max=20 – nchar(20)
  • Fixed=(None), Max=20 – nvarchar(20)
  • Fixed=False, Max=20 – nvarchar(20)
  • Fixed=(None), Max=(None) – nvarchar(max)
  • Fixed=True, Max=(None) – nchar(4000)
  • Fixed=False, Max=(None) – nvarchar(max)
  • Fixed=(None), Max=Max – nvarchar(max)
  • Fixed=True, Max=Max – nchar(4000)
  • Fixed=False, Max=Max – nvarchar(max)
  • Fixed=(None), Max=20, Unicode=True – nvarchar(20)
  • Fixed=(None), Max=20, Unicode=False – varchar(20)

This means that if you just pick String as the type and set nothing else, you’ll get nvarchar(max) as the SQL Server type.

NOTE: The value of 4,000 is being used because the maximum length of a fixed length string in SQL Server is 4,000 characters (8,000 bytes when encoded as UTF-16).

 

That Conference 2016 – What’s that Smell? It’s your Entity Framework!

That Conference 2016, Kalahari Resort, Lake Delton, WI
What’s that Smell? It’s your Entity Framework! – Russ Thomas

Day 2, 9 Aug 2016

Disclaimer: This post contains my own thoughts and notes based on attending That Conference 2016 presentations. Some content maps directly to what was originally presented. Other content is paraphrased or represents my own thoughts and opinions and should not be construed as reflecting the opinion of the speakers.

Executive Summary

  • Entity Framework often gets a bad rap when it appears to generate very poor T-SQL
  • You can generate far better T-SQL by paying attention to how you construct queries in LINQ
  • This talk describes several “code smells” that lead to poorly performing T-SQL and that can easily be improved
  • [Sean: Most of this also applies to LINQ to SQL]

All communication with SQL Server is via T-SQL

  • T-SQL is the only way to talk to the server
  • From T-SQL to database storage
  • Goes through Optimization (indexes, etc.)
  • There’s no such thing as a technology that bypasses T-SQL
  • Entity Framework adds 4th piece
    • EF -> T-SQL -> Optimization -> Database Storage
  • EF makes things easier because it quickly scaffolds T-SQL
  • Goal today–work towards higher performance in EF

Code smell #1 – too many columns

  • EF can bring back too many columns from database
  • Demo
    • Retrieve data from People table, do something with LastName, Firstname
    • Using simple LINQ statement, just do select p (e.g. from person table)
    • Then foreach on results, pull off LastName, FirstName
  • Look at what EF sends
    • Start up Profiler (yes, should use Extended Events)
    • Reminder: never do the trace in production
    • If writing this query manually in T-SQL, we’d do simple query
    • Look at trace from EF version of this
    • Query is selecting large list of columns–far more than we’re using
    • Can also look at Reads metric in Profiler
  • Recommendation
    • Specify exactly the columns that you want
    • Use Projection
    • instead of: select p,
    • use: select new {p.LastName, p.FirstName}
    • Goes from 544 reads to 4 reads
    • Query now looks like what we’d write ourselves

Code smell #2 – Non Set Based Work

  • Demo–Retrieving e-mails
  • Write a couple of loops, loop through People p, then loop through p.EmailAddresses (implicit join)
  • LINQ query is still returning entire People table
  • And then code is iterating through everything to get e-mail addresses
  • This results in a new T-SQL statement for each pass through the loop
  • Horrible performance
  • Don’t retrieve data procedurally
  • Relational DB works better if you retrieve using set-based query
  • Solution
    • In T-SQL, you’d just do a join
    • In LINQ
      • from people in db.People.Include(“EMailAddresses”)
      • Tells EF that we’ll be coming back later for e-mail address
    • Even better way than .Include
      • Do join in LINQ
      • Then only a small iteration on final results

Code first discussion and demo

  • Most people prefer model-first, but code-first is slick
  • Can write code and then generate DB from code
  • Discussion
    • By default, it set column type to nvarchar(max) for string
    • And decimal(18,2) for double (e.g. a salary field)
  • Conclusion
    • IF you’re going to do code-first, go further than just column names and types
    • SQL Server does care about max string length, makes educated guesses on storage
  • Better way
    • Override OnModelCreating in DbContext, use fluent syntax on modelBuilder to set stuff up
    • Set various attributes for each column, e.g. maximum
    • Re-gen model, can see better attributes on columns
  • Thoughts
    • Code-first, often starts as simple hobby project
    • But you really do need to set proper attributes ahead of time

How to see what EF is doing under-the-covers

  • Profiler
    • Don’t run on production server
  • So how do we see what’s going on in EF without using profiler?
    • SqlLogger class, IDBCommandInterceptor
    • Requires EF 6
    • Can write out to text file, capturing TSQL statements
    • You implement different methods to capture different kinds of things
    • Can do xxxExecuting or xxxExecuted
    • Then: DbInterception.Add(new MyCommandInterceptor());
  • Stuff is intercepted client-side
    • So doesn’t impact performance on server
  • [Sean: Can also set context.Database.Log property]
  • Other stuff you can do with interceptors
    • Intercept and change statement, e.g. “soft deletes” where they don’t actually delete but just set IsDeleted column
    • No trigger !

Other comments

  • There are other optimizations you can do that have nothing to do with SQL Server

Wrapup

  • Will put zipped up demo on sqljudo.com

TechEd NA 2014 – Industrial Strength Entity Framework

TechEd North America 2014, Houston
Industrial-Strength Entity Framework – John Mason, Robert Vettor

Day 3, 14 May 2014, 5:00PM-6:15PM (DEV-B356)

Disclaimer: This post contains my own thoughts and notes based on attending TechEd North America 2014 presentations. Some content maps directly to what was originally presented. Other content is paraphrased or represents my own thoughts and opinions and should not be construed as reflecting the opinion of either Microsoft, the presenters or the speakers.

Executive Summary—Sean’s takeaways

  • Entity Data Model doesn’t have to be 1-1 mapping to database tables
    • EDM represents model that the developer works with
  • Segregate models, based on functional context
    • Complex system would typically have several models
    • Data linking accomplished by bringing in only what’s needed from another model
    • E.g. Order EDM has Customer info, but only some basic read-only Customer attributes
  • Transactions
    • EF does implicit transaction on SaveChanges
    • You can now do explicit transactions
  • Concurrency
    • You need to be thinking about it
    • Catch concurrency exceptions and do something about them (e.g. merge)
  • Performance
    • Measure performance of queries with Profiler and Query Analyzer
    • Some good tricks for using Profiler
    • Stored Procedures are fine—first-class citizens in EF
    • Use async—prevalent in EF now
    • Query caching in EF—huge gains
  • For best performance, use EF 6 and .NET 4.5+
    • Query caching requires EF 5, .NET 4.5

Full video

John Mason – Senior Developer Consultant, Microsoft Premier Services

Rob Vettor – Senior Developer Consultant, Microsoft Premier Services


Simple Layered (Multitier) Model

  • People sometimes don’t understand what’s in each box

More Complex Layered (Multitier) Model

  • Service Layer blown out
    • KPIs – business (transactions), devs (perfmon)
    • Unit of Work – how to talk to Data Layer

Complex Layered Multitier Model

  • This is more typical of modern multi-tiered system
  • Data Tier
    • Data Sources might be database or something else
    • EF will at some point be able to talk to non-relational data sources
    • “Critically important” that we accept NoSQL model
  • This is where Entity Framework fits into the large picture

Enterprise Apps are Data Aware

What is a Model?

  • EF exposes Entity Data Model
  • On the left side are database tables
  • On the right side, classes
    • Code against data model that might not map 1-1 to database
    • This is the object model
  • Default behavior is 1-1 mapping between database table and a class
  • Dev’s shape of the data vs. DBA shape of the data
  • If DBA changes schema, only the mapping (arrows in middle) has to change

Model Simplifies the Data

  • Model simplifies data
    • Validates your understanding
    • Stakeholders can more easily understand
    • Reduces complexity
  • Defining the model
    • Sketch data model for enterprise
    • Create initial mappings
    • Identify functional groupings
  • Abstraction has cost
    • Overhead of models
    • Overloading with entire entity graph might not make sense
    • Good idea to work with subset of entities

Model Segregation

  • Break model up into smaller functional context
    • Can have several different models (EDMs) that segment the business domain
    • Call these “functional contexts”
  • Benefits
    • Each functional context targets specific area of business domain
    • Reduces complexity
    • Functionality can evolve and refactor can occur on separate timelines

Improve Model by Refactoring

  • Customer vs. Order contexts
  • Customer context – might be where you actually change customer
  • In Order context, just needs read-only reference to customer
    • So you just pull in subset of tables/fields, read-only
  • Helps ensure that devs use the context properly
    • Avoid architectural erosion—if you don’t understand architecture, you make changes that don’t match the original vision
  • Also enables teams to work separately

Enterprise Apps are Transactional

  • Typically

Transaction

  • Are
    • Related operations, single unit of work
    • Either succeeds or fails (Commit or Rollback)
    • Guarantees data consistency / integrity
  • Basic types
    • Local
      • Single-phased (1 database) handled directly by database provider
    • Distributed
      • Affects multiple resources
      • Coordinated by 3rd party transaction manager component
      • Implement two-phased commit—present candidate changes, then go ahead and do it

Local Transactions

  • Implicit
    • Built into EFs internal architecture – automatic
    • SaveChanges invokes implicit transaction
      • EF automatically either commits or rolls back
  • Explicit – Two Types
    • BeginTransaction
    • UseTransaction

Demo – Transactions

  • Default behavior
    • Profiler running in background, shows SQL being used
    • After SaveChanges, do raw SQL Command—gets its own transaction
    • Another set of changes and then SaveChanges—another BEGIN TRANSATION / COMMIT TRANSACTION
    • Doing smaller transactions is good practice for highly transactional database
  • BeginTransaction (new in EF 6)
    • ctx.Database.BeginTransaction
    • trans.Commit
    • Now we get everything within one transaction
  • UseTransaction
    • Create new connection, outside of EF
    • On Connection, BeginTransaction
    • Create EF context, pass in conn, which has transaction
    • ctx.Database.UseTransaction(trans)
    • Still assumes single database

Best Practices – Transactions

  • Favor implicit local transactions
    • All you need, usually
    • Automatic when calling SaveChanges()
    • Implements Unit of Work pattern
  • Explicit local transaction
    • BeginTransaction, UseTransaction
  • Distributed transactions
    • Favor TransactionScope

Enterprise Apps are Concurrent

Concurrency

  • When multiple users modify data at the same time
  • Two models
    • Optimistic concurrency
      • Pro: Rows not locked between initial query and update
      • Con: Database value can be overwritten
    • Pessimistic concurrency
      • Pro: Row locked until operation complete
      • Con: Not scalable, excessive locking, lock escalation
      • Con: No out-of-box support in ADO.NET and EF

Concurrency in EF

  • Out-of-box, EF has optimistic concurrency – last in wins
  • Can detect concurrency conflicts by adding concurrency token
  • EF, on update, delete, automatically adds concurrency check to where clause
  • If concurrency conflict, EF throws DbUpdateConcurrencyException

Resolve Concurrency Conflicts

  • Client Wins
    • Out-of-box behavior
  • Server wins
    • First in stays – your changes discarded
  • Notify user of conflicts
  • Code custom algorithm
    • Let business decide
    • EF has: 1) original data; 2) your changed data; 3) data currently in database

Demo – Concurrency

  • When you catch DbUpdateConcurrencException, good place to have strategy pattern
  • E.g. custom merge
  • GetDatabaseValues, OriginalValues, CurrentValues
  • Merge is sometimes possible (if we’re not changing the same thing)
  • On merge failure (true conflict), we log it

“Programming to DBContext” – Rowan Miller

Enterprise Apps are Performant

Performance in Apps

  • Defining performance
    • Emergent property, blends latency and underlying infrastructure
  • Measuring performance
    • Can’t improve what you don’t measure
    • Business KPIs and application KPIs
    • Use application KPIs to keep track of application metrics that are meaningful to you
    • Who falls over first: business tier, database, etc.
  • Performance impactors
    • Can buy your way out of bandwidth issues
    • Can’t buy fix for latency
    • Must be considered up front

Performance rules

  • Choose right architecture
  • Test early/often
  • Whenever you make a change, re-test

Performance Pillar – Efficient Queries

Default Query Behavior

  • LINQ query eventually transformed into SQL query
  • EF materializes untyped SQL result set into strongly-typed entities

Query Performance

  • Is Your SQL Efficient?
    • Customers assume that LINQ is smarter than it actually is
    • Looks like SQL but doesn’t necessarily act like SQL
    • EF may not be efficient
  • Must Profile LINQ Queries
    • Structure of LINQ expressions can have tremendous impact on query performance
    • EF can’t determine query efficiency from LINQ expression
  • The Truth is in the SQL execution plan
    • Actual, not estimated

In a perfect world, you’d profile all of your queries

Profiling Check List

  • Set Application Name Attribute in conn string
    • Allows filtering just SQL from your application
  • Run SQL Trace
  • Run Query Analyzer
  • SET STATISTICS TIME ON/OFF
  • Review Actual Execution Plan

Demo – Profiling

  • Profile slow or complicated query
  • Tools | SQL Server Profiler
  • Start trace
  • Events Selection
    • No Audit, ExistingConnection,
    • Only show RPC Completed, SQL BatchCompleted
  • Column Filters
    • Application Name like “EF” (or whatever you put in conn string)
  • Run
  • Run simple query
  • In Profiler, you see full SQL query
    • Tells us what application is doing—just the query
  • Copy out of Profiler
  • New Query window
  • Paste in
  • SET STATISTICS TIME ON
    • (and OFF at bottom)
  • Select database
  • Query | Include Actual Execution Plan
  • Get rows back, took 94 ms, then 35 ms
    • SQL caching
  • Look at Execution plan
    • Favor Seeks over Scans
    • Clustered Index Scan
    • Given structure of database, we can’t improve on it
  • Look at how we’d write query
    • Query plan is the same on your own query
  • You really don’t know how query works until you look at Query Execution Plan
  • If you know of a better way to get to the data, you can use a Stored Proc

Can change underlying LINQ statement

  • Sometimes can find better way

What About Stored Procedures?

  • EF Loves to Generate SQL
    • Default behavior – devs do LINQ, EF gens SQL
  • Sprocs always option
    • Simple to map select/update ops to stored procs
    • EF generates method from which to invoke
  • Consistency across EF
    • Same experience
    • Full change tracking
  • Right tool for the Job

“Don’t fear the SPROC”

Performance Best Practices

  • Profile early/often
    • Slow operations of complex queries
  • Must Haves…
    • Application Name conn string attribute
    • Set Statistics Time On/Off
    • Actual vs. Estimated Executed Plan
  • Don’t fear the Sproc
    • Stored procs are first-class citizens in EF
    • Customers often blend both EF queries and stored procedures in hybrid approach
    • Common customer usage pattern for large application

80/20

  • 80% CRUD simple queries
  • 20% more complex
    • Good place for stored procedure

Performance Pillar Asynchronous Operations

Async

  • EF 6 Exposes Asynchronous Operations
    • Most IQueryable<T> methods now async-enabled
    • Leverages recent Async/Await pattern
  • Fast and Fluid Experience
    • Increase client responsiveness, server scalability
    • Prevents blocking of main thread
    • *** more ***

Demo – Async

  • Method that does async call, do await
  • In async function, return Task
  • await ctx.SaveChangesAsync()
  • FirstOrDefaultAsync

Performance Pillar: Caching

Performance Enhancements

  • Overall Query Construction
  • Performance Improvements
    • EF 5 and EF 6 has big perf improvements
    • And need .NET Framework 4.5+

Query Caching

  • Earlier
    • Linq query
    • Compiler builds expression tree
    • EF parses tree and builds SQL query
    • Repeated each time query executed
  • Now, Autocompiled Queries
    • EF 5+, SQL query constructed once, stored in EF query cache and reused
    • Parameterized – same query/different parameter values
    • For your application, all user share this cache (App Domain)
    • Supports up to 800 queries
    • Uses MRU algorithm to decide which queries to drop
  • This is (obviously) different from SQL caching

Closing Thoughts

Step 0 – Learn the Framework

  • Framework is complex
  • Allocated time to learn
    • Understand LINQ and subtle nuances of each
    • Understand immediate vs. deferred execution
  • For Microsoft Premier customers, Premier EF training

Books

  • Programming Entity Framework
  • DbContext
  • Code First
  • Entity Framework 4.1: Expert’s Cookbook

http://msdn.com/

Entity Framework 6 Recipes – Rob Vettor

Session – Offline-Enabled Data Services

PDC 2008, Day #3, Session #4, 1 hr 15 mins

Pablo Castro

I attended a second session with Pablo Castro (the previous one was the session on Azure Tables).  This session focused on a future capability in ADO.NET Data Services that would allow taking data services “offline” and then occasionally synching them with the online data.

Background – Astoria

ADO.NET Data Services was recently released as part of the .NET Framework 3.5 SP1 release.  It was formerly known as project “Astoria”.

The idea of ADO.NET Data Services is to allow creating a data service that lives on the web.  Your data source can be anything—a SQL Server database, third-party database, or some local data store.  You wrap your data source using the Entity Data Model (EDM) and ADO.NET Data Services Runtime.  Your data is now available over the web using standard HTTP protocols.

Once you have a data service that is exposed on the web, you can access it from any client.  Because the service is exposed using HTTP/REST protocols, you can access your data using simple URIs.  By using URIs, you are able to create/read/update/delete your data (POST/GET/PUT/DELETE in REST terms).

Because we can access the data using HTTP, our client is not limited to a .NET application, but can be any software that knows about the web and HTTP.  So we can consume our data from Javascript, Ruby, or whatever.  And the client could be a web-based application or a thick client somewhere that has access to the web.

If your client is a .NET client, you can use the ADO.NET Data Services classes in the .NET Framework to access your data, rather than having to build up the underlying URIs.  You can also use LINQ.

So that’s the basic idea of using ADO.NET Data Services and EDM to create a data service.  For more info, see:

The Data Synchronization Landscape

Many of the technologies that made an appearance at PDC 2008 make heavy use of data synchronization.  Data synchronization is available to Azure cloud services or to Live Mesh applications.

The underlying engine that handles data synchronization is the Microsoft Sync Framework.  The Sync Framework is a synchronization platform that allows creating sync providers, sitting on top of data that needs to be synchronized, as well as sync consumers—clients that consume that data.

The basic idea with sync is that you have multiple copies of your data in different physical locations and local clients that make use of that data.  Your client would work with its own local copy of the data and then the Sync Framework would ensure that the data is synched up with all of the other copies of the data.

What’s New

This session talked about an effort to add support in ADO.NET Data Services for offline copies of your Astoria-served data, using the Sync Framework to do data synchronization.

Here are the basic pieces (I’m too lazy to draw a picture).  This is just one possible scenario, where you want to have an application that runs locally and makes use of a locally cached copy of your data, which exists in a database somewhere:

  • Data mainly “lives” in a SQL Server database.  Assume that the database itself is not exposed to the web
  • You’ve created a data service using ADO.NET Data Services and EDM that exposes your SQL Server Data to the web using a basic REST-based protocol.  You can now do standard Create/Read/Update/Delete operations through this interface
  • You might have a web application running somewhere that consumes this data.  E.g. A Web 2.0 site built using Silverlight 2, that allows viewing/modifying the data.  Note that the web server does not have a copy of your data, but goes directly to the data service to read/write its data.
  • Now you create a thick client that also wants to read/write your data.  E.g. A WPF application.  To start with, you assume that you have a live internet connection and you configure the application to read/write data directly from/to your data service

At this point, you have something that you could build today, with the tools in the .NET Framework 3.5 SP1.  You have your data out “in the cloud” and you’ve provided both rich and thin clients that can access the data.

Note: If you were smart, you would have reused lots of code between the thin (Silverlight 2) and thick (WPF) clients.  Doing this gives your users the most consistent GUI between online and offline versions.

Now comes the new stuff.  Let’s say that you have cases when you want your thick WPC client to be able to work even though the Internet connection is not present.  Reasons for doing this include:

  • You’re working on a laptop, somewhere where you don’t have an Internet connection (e.g. airplane)
  • You want the application to be more reliable—i.e. app is still usable even if the connection disappears from time to time
  • You’d like the application to be slightly better performing.  As it stands, the performance depends on network bandwidth.  (The “lunchtime slowdown” phenomenon).

Enter Astoria Offline.  This is the set of extensions to Astoria that Pablo described, which is currently not available, but planned to be in Alpha by the end of the year.

With Astoria Offline, the idea is that you get a local cache of your data on the client PC where you’re running your thick client.  Then what happens is the following:

  • Your thick (WPF) application works directly with the offline copy of the data
  • Performance is improved
  • Much more reliable—the data is always there
  • You initiate synchronization (or set it up from time to time) to synch data back to the online copy

This synchronization is accomplished using the new Astoria Offline components.  When you do synchronize, the synchronization is two-ways, which means that you update both copies with any changes that have occurred since you last synched:

  • All data created locally is copied up to the online store
  • Data created online is copied down
  • Changes are reconciled—two-way
  • Deletions are reconciled—two-way

Pablo did a basic demo of this scenario and it worked just as advertised.  He showed that the client worked with the local copy of the data and that everything synched properly.  He also showed off some early tooling in Visual Studio that will automate much of the configuration that is required for all of this to work.

Interestingly, it looked like in Pablo’s example, the local copy of the data was stored in SQL Server Express.  This was a good match, because the “in the cloud” data was stored in SQL Server.

How Did They Do It?

Jump back to the description of the Microsoft Sync Framework.  Astoria Offline is using the sync framework to do all of the underlying synchronization.  They’ve written a sync provider that knows about the entity data model and interfaces between EDM and the sync framework.

Extensibility Points

I’m a little fuzzier on this area, but I think I have a general sense of what can be done.

Note that the Sync Framework itself is extensible—you can write your own sync providers, providing synchronized access to any data store that you care to support.  Once you do this, you get 2-way (or more) synchronization between your islands of custom data.

But if I understood Pablo correctly, it sounds like you could do this a bit differently with Astoria Offline in place.  It seems like you could pump your custom data from the Entity Framework, by building a custom data source so that the EDM can see your data.  (EntityFrameworkSyncProvider fits in here somewhere).  I’m guessing that once you serve up your data in a relational manner to the EDM, you can then synch it using the Astoria Offline mechanisms.  Fantastic stuff!

Going Beyond Two Nodes

One could imagine going beyond just an online data source and an offline copy.  You could easily imagine topologies that had many different copies of the data, in various places, all being synched up from time to time.

Other Stuff

Pablo talked about some of the other issues that you need to think about.  Conflict detection and resolution is a big one.  What if two clients both update the same piece of data at the same time?  Classic synchronization issue.

The basic things to know about conflicts, in Astoria Offline, are:

  • Sync Framework provides a rich model for detecting/resolving conflicts, under the covers
  • Astoria Offline framework will detect conflicts
  • The application provides “resolution handlers” to dictate how to resolve the conflict
    • Could be locally—e.g. ask the user what to do
    • Or online—automatic policies

Pablo also talked briefly about the idea of Incremental Synchronization.  The idea is that you might want to synch things a little bit at a time, in a batch-type environment.

There was a lot more stuff here, and a lot to learn.  Much of the concepts just bubble up from the Sync Framework.

Takeaways

Astoria Offline is potentially game-changing.  In my opinion, of the new technologies presented at PDC 2008, Astoria Offline is the one most likely to change the landscape.  In the past, vendors have generally had to pick between working with live data or local data.  Now they can do both.

In the past, the online vs. offline data choice was driven by whether you needed to share the data across multiple users.  So the only apps that went with offline data were the ones that didn’t need to share their data.  What’s interesting about Astoria Offline is that these apps/scenarios  can now use this solution to leave their data essentially local, but make the data more mobile, across devices.  Imagine an application that just stores local data that only it consumes.  But now if you want to run that app on multiple machines, you have to manually copy the data—or move it to a share seen by both devices.  With Astoria Offline, you can set up a sync to an online location that each device synchs to, as needed.  So you can just move from device to device and your data will just follow you.  So you can imagine that this makes it much easier to move apps out to mobile devices.

This vision is very similar to what Live Mesh and Live Services promise.  But the difference is that here you don’t need to subscribe to your app and its data living in the MS Live space.  Your data can be in whatever format you like, and nobody needs to sign up with MS Live.

When Can I Get It?

Pablo mentioned a basic timeline:

  • Early Alpha by the end of the year
  • CTPs later, i.e. next year

Resources

In addition to the links I listed above, you might also check out: