That Conference 2018 – Know What Your Code Is Doing to SQL Server

Posted by Sean

hat Conference 2018, Kalahari Resort, Lake Delton, WI
Know What Your Code Is Doing to SQL Server – Kevin Boles

Day 1, 6 Aug 2018

Disclaimer: This post contains my own thoughts and notes based on attending That Conference 2018 presentations. Some content maps directly to what was originally presented. Other content is paraphrased or represents my own thoughts and opinions and should not be construed as reflecting the opinion of the speakers.

Kevin Boles
TheSQLGuru@gmail.com

Executive Summary

Handful of specific examples of ways that you can end up with poorly performing SQL Server queries
Improper use of Entity Framework sometimes to blame
Always pay attention to exact T-SQL that is being used

Miscellaneous

Recommend 192GB on server, go beyond max memory (128GB)
In memory OLTP now down in standard edition
SQL Server 2016–It Just Runs Faster
- Good blog posts on performance improvements
Get on 2016 or 2017

Entity Framework is tool that simplifies, but lower performance

Can be very productive
Several traps, easy to fall into
Other ORMs have similar problem

Being too greedy with rows

EF exposes objects without knowing values
E.g. pull data out to list, when do Where clause on the list
- Pulls entire table before filtering
Do filtering on SQL Server
- Provided that you ave the proper filter
Or could bring back to IQueryable

N+1 Select Problem–minimize trips to DB

ANTS performance profiler
E.g. One query to parent entities, 2nd query for each child to get related entities
Round-trip for each child query
Lazy Loading
“N+1 select problem”
If you know you want child data ahead of time, do the original query to include it
Use Eager Loading
- .Where, .Include
Make sure that you really need them
Don’t do aggregation in client–do it in SQL Server
E.g. if we’re just counting child objects, do it with aggregate function in SQL Server
Don’t iterate and don’t get data that you don’t need

Being too Greedy with Columns

E.g. pull all Pupils from school, then iterate to dump out certain stuff
Selecting all columns when you just need subset of columns
EF doesn’t know what columns you want
Causes two problems
- More data than we need
- Impossible to index, since we pull everything back
SELECT WITH NO LOCK
- If acceptable, can be faster because you don’t wait for confurrent transactions
If you’re pulling all columns, you’re locking entire table for the length of that query
Select new to just get you want

Mismatched Data Types

If data types don’t match, even simple queries can perform poorly
E.g. search for entities with particular zip code
Dev system should match prodution system in terms of scale
Find query runs fast, EF runs very slowly
Query plan warning
- CONVERT_IMPLICIT
- Function on a column in a WHERE clause
Column in database is VARCHART, but .NET has string, which is UTF16 Unicode
SQL Server does
- INDEX SCAN of every zip code
- Big I/O cost
- Then converts the data–CPU hit
- Optimizer can’t do prediction because data is different than what’s in the index
Solution
- Edit model to tell EF to use VARCHAR, using column annotation
- In code, attribute
Do not let EF create models, code-first

ORM ?

He recommends not using one

ADO.NET

3 things to remember about stuff on internet about SQL Server
- Who wrote it
- When was it written
- Does it apply to my system
ADO.NET Parameterized Queries
AddWithValue–get rid of this

Overly Generic Queries

Allowing searching on multiple fields
So we construct one query, where clauses for each search field
== null or matches
IS NULL OR stuff gets into query
Bad because it builds query plan just once for all the possible search combinations
Always runs same query plan, no matter what you search on
Can’t do a seek on index
Options
- Stored procedure — Gail – “catch all queries”
- Conditionals in EF side, exclude clauses
- Make SQL Server recompile plans each time–from within EF. Sometimes good
- Could write interceptor, option(recomiple)

Bloating the plan cache

Reuse of execution plans often a good thing, avoid regenerating plan
In order for a plan to be reused, the statement text must be textually identical, wihch as we just saw, is case for parameterized queries
- Ad-hoc workloads

Skip / Take

OFFSET FETCH
Next time we run query, we get different query plan
Enable SQL Server setting ‘optimize for ad-hoc workloads ‘
- Less aggressive at caching plans, generally a good thing
EF6, pass in lambdas
- In SQL, values are paremetrized, so we don’t recreate cache plan

Inserting data

EF will run separate INSERT statements for every row being added
Not good if you have to insert a lot of data
Can use EF.BulkInsert or use EF 7 (has this out-of-the-box)

Code First, Performance Last

Never allow your code to create your storage constructs
(He means–code-first used to refresh database object)

Other Potential Issues

Improper use of IN clauses
- >16 clauses in IN clause if bad
Correlated subqueries instead of JOINs
Parameterized queries with parameters defined to be length of data they contain
Chatty apps
Security conerns
Increased IO and app bloat due to asking for too mcuh data

String Properties in ADO.NET Entity Data Model

Posted by Sean

(NOTE: This is Model First development in Entity Framework–start with a data model in the Entity Framework Designer and then generate database schema from the model).

When you add a scalar property of type string to an entity data model, you can fill in the values of several other properties within Visual Studio.

Fixed Length – (None), True, False
Max Length – (None), value, Max
Unicode – (None), True, False

stringprop

What does a value of (None) mean for these properties (when generating SQL Server Database from model)?

Fixed Length – (None) equivalent to False
Max Length – (None) equivalent to Max
Unicode – (None) equivalent to True

Here’s how choices for these properties map to SQL Server data types:

Fixed=True, Max=20 – nchar(20)
Fixed=(None), Max=20 – nvarchar(20)
Fixed=False, Max=20 – nvarchar(20)
Fixed=(None), Max=(None) – nvarchar(max)
Fixed=True, Max=(None) – nchar(4000)
Fixed=False, Max=(None) – nvarchar(max)
Fixed=(None), Max=Max – nvarchar(max)
Fixed=True, Max=Max – nchar(4000)
Fixed=False, Max=Max – nvarchar(max)
Fixed=(None), Max=20, Unicode=True – nvarchar(20)
Fixed=(None), Max=20, Unicode=False – varchar(20)

This means that if you just pick String as the type and set nothing else, you’ll get nvarchar(max) as the SQL Server type.

NOTE: The value of 4,000 is being used because the maximum length of a fixed length string in SQL Server is 4,000 characters (8,000 bytes when encoded as UTF-16).

That Conference 2016 – What’s that Smell? It’s your Entity Framework!

Posted by Sean

That Conference 2016, Kalahari Resort, Lake Delton, WI
What’s that Smell? It’s your Entity Framework! – Russ Thomas

Day 2, 9 Aug 2016

Disclaimer: This post contains my own thoughts and notes based on attending That Conference 2016 presentations. Some content maps directly to what was originally presented. Other content is paraphrased or represents my own thoughts and opinions and should not be construed as reflecting the opinion of the speakers.

Executive Summary

Entity Framework often gets a bad rap when it appears to generate very poor T-SQL
You can generate far better T-SQL by paying attention to how you construct queries in LINQ
This talk describes several “code smells” that lead to poorly performing T-SQL and that can easily be improved
[Sean: Most of this also applies to LINQ to SQL]

All communication with SQL Server is via T-SQL

T-SQL is the only way to talk to the server
From T-SQL to database storage
Goes through Optimization (indexes, etc.)
There’s no such thing as a technology that bypasses T-SQL
Entity Framework adds 4th piece
- EF -> T-SQL -> Optimization -> Database Storage
EF makes things easier because it quickly scaffolds T-SQL
Goal today–work towards higher performance in EF

Code smell #1 – too many columns

EF can bring back too many columns from database
Demo
- Retrieve data from People table, do something with LastName, Firstname
- Using simple LINQ statement, just do select p (e.g. from person table)
- Then foreach on results, pull off LastName, FirstName
Look at what EF sends
- Start up Profiler (yes, should use Extended Events)
- Reminder: never do the trace in production
- If writing this query manually in T-SQL, we’d do simple query
- Look at trace from EF version of this
- Query is selecting large list of columns–far more than we’re using
- Can also look at Reads metric in Profiler
Recommendation
- Specify exactly the columns that you want
- Use Projection
- instead of: select p,
- use: select new {p.LastName, p.FirstName}
- Goes from 544 reads to 4 reads
- Query now looks like what we’d write ourselves

Code smell #2 – Non Set Based Work

Demo–Retrieving e-mails
Write a couple of loops, loop through People p, then loop through p.EmailAddresses (implicit join)
LINQ query is still returning entire People table
And then code is iterating through everything to get e-mail addresses
This results in a new T-SQL statement for each pass through the loop
Horrible performance
Don’t retrieve data procedurally
Relational DB works better if you retrieve using set-based query
Solution
- In T-SQL, you’d just do a join
- In LINQ
  - from people in db.People.Include(“EMailAddresses”)
  - Tells EF that we’ll be coming back later for e-mail address
- Even better way than .Include
  - Do join in LINQ
  - Then only a small iteration on final results

Code first discussion and demo

Most people prefer model-first, but code-first is slick
Can write code and then generate DB from code
Discussion
- By default, it set column type to nvarchar(max) for string
- And decimal(18,2) for double (e.g. a salary field)
Conclusion
- IF you’re going to do code-first, go further than just column names and types
- SQL Server does care about max string length, makes educated guesses on storage
Better way
- Override OnModelCreating in DbContext, use fluent syntax on modelBuilder to set stuff up
- Set various attributes for each column, e.g. maximum
- Re-gen model, can see better attributes on columns
Thoughts
- Code-first, often starts as simple hobby project
- But you really do need to set proper attributes ahead of time

How to see what EF is doing under-the-covers

Profiler
- Don’t run on production server
So how do we see what’s going on in EF without using profiler?
- SqlLogger class, IDBCommandInterceptor
- Requires EF 6
- Can write out to text file, capturing TSQL statements
- You implement different methods to capture different kinds of things
- Can do xxxExecuting or xxxExecuted
- Then: DbInterception.Add(new MyCommandInterceptor());
Stuff is intercepted client-side
- So doesn’t impact performance on server
[Sean: Can also set context.Database.Log property]
Other stuff you can do with interceptors
- Intercept and change statement, e.g. “soft deletes” where they don’t actually delete but just set IsDeleted column
- No trigger !

Other comments

There are other optimizations you can do that have nothing to do with SQL Server

Wrapup

Will put zipped up demo on sqljudo.com

TechEd NA 2014 – Industrial Strength Entity Framework

Posted by Sean

TechEd North America 2014, Houston
Industrial-Strength Entity Framework – John Mason, Robert Vettor

Day 3, 14 May 2014, 5:00PM-6:15PM (DEV-B356)

Disclaimer: This post contains my own thoughts and notes based on attending TechEd North America 2014 presentations. Some content maps directly to what was originally presented. Other content is paraphrased or represents my own thoughts and opinions and should not be construed as reflecting the opinion of either Microsoft, the presenters or the speakers.

Executive Summary—Sean’s takeaways

Entity Data Model doesn’t have to be 1-1 mapping to database tables
- EDM represents model that the developer works with
Segregate models, based on functional context
- Complex system would typically have several models
- Data linking accomplished by bringing in only what’s needed from another model
- E.g. Order EDM has Customer info, but only some basic read-only Customer attributes
Transactions
- EF does implicit transaction on SaveChanges
- You can now do explicit transactions
Concurrency
- You need to be thinking about it
- Catch concurrency exceptions and do something about them (e.g. merge)
Performance
- Measure performance of queries with Profiler and Query Analyzer
- Some good tricks for using Profiler
- Stored Procedures are fine—first-class citizens in EF
- Use async—prevalent in EF now
- Query caching in EF—huge gains
For best performance, use EF 6 and .NET 4.5+
- Query caching requires EF 5, .NET 4.5

Full video

John Mason – Senior Developer Consultant, Microsoft Premier Services

Rob Vettor – Senior Developer Consultant, Microsoft Premier Services

Simple Layered (Multitier) Model

People sometimes don’t understand what’s in each box

More Complex Layered (Multitier) Model

Service Layer blown out
- KPIs – business (transactions), devs (perfmon)
- Unit of Work – how to talk to Data Layer

Complex Layered Multitier Model

This is more typical of modern multi-tiered system
Data Tier
- Data Sources might be database or something else
- EF will at some point be able to talk to non-relational data sources
- “Critically important” that we accept NoSQL model
This is where Entity Framework fits into the large picture

Enterprise Apps are Data Aware

What is a Model?

EF exposes Entity Data Model
On the left side are database tables
On the right side, classes
- Code against data model that might not map 1-1 to database
- This is the object model
Default behavior is 1-1 mapping between database table and a class
Dev’s shape of the data vs. DBA shape of the data
If DBA changes schema, only the mapping (arrows in middle) has to change

Model Simplifies the Data

Model simplifies data
- Validates your understanding
- Stakeholders can more easily understand
- Reduces complexity
Defining the model
- Sketch data model for enterprise
- Create initial mappings
- Identify functional groupings
Abstraction has cost
- Overhead of models
- Overloading with entire entity graph might not make sense
- Good idea to work with subset of entities

Model Segregation

Break model up into smaller functional context
- Can have several different models (EDMs) that segment the business domain
- Call these “functional contexts”
Benefits
- Each functional context targets specific area of business domain
- Reduces complexity
- Functionality can evolve and refactor can occur on separate timelines

Improve Model by Refactoring

Customer vs. Order contexts
Customer context – might be where you actually change customer
In Order context, just needs read-only reference to customer
- So you just pull in subset of tables/fields, read-only
Helps ensure that devs use the context properly
- Avoid architectural erosion—if you don’t understand architecture, you make changes that don’t match the original vision
Also enables teams to work separately

Enterprise Apps are Transactional

Typically

Transaction

Are
- Related operations, single unit of work
- Either succeeds or fails (Commit or Rollback)
- Guarantees data consistency / integrity
Basic types
- Local
  - Single-phased (1 database) handled directly by database provider
- Distributed
  - Affects multiple resources
  - Coordinated by 3^rd party transaction manager component
  - Implement two-phased commit—present candidate changes, then go ahead and do it

Local Transactions

Implicit
- Built into EFs internal architecture – automatic
- SaveChanges invokes implicit transaction
  - EF automatically either commits or rolls back
Explicit – Two Types
- BeginTransaction
- UseTransaction

Demo – Transactions

Default behavior
- Profiler running in background, shows SQL being used
- After SaveChanges, do raw SQL Command—gets its own transaction
- Another set of changes and then SaveChanges—another BEGIN TRANSATION / COMMIT TRANSACTION
- Doing smaller transactions is good practice for highly transactional database
BeginTransaction (new in EF 6)
- ctx.Database.BeginTransaction
- trans.Commit
- Now we get everything within one transaction
UseTransaction
- Create new connection, outside of EF
- On Connection, BeginTransaction
- Create EF context, pass in conn, which has transaction
- ctx.Database.UseTransaction(trans)
- Still assumes single database

Best Practices – Transactions

Favor implicit local transactions
- All you need, usually
- Automatic when calling SaveChanges()
- Implements Unit of Work pattern
Explicit local transaction
- BeginTransaction, UseTransaction
Distributed transactions
- Favor TransactionScope

Enterprise Apps are Concurrent

Concurrency

When multiple users modify data at the same time
Two models
- Optimistic concurrency
  - Pro: Rows not locked between initial query and update
  - Con: Database value can be overwritten
- Pessimistic concurrency
  - Pro: Row locked until operation complete
  - Con: Not scalable, excessive locking, lock escalation
  - Con: No out-of-box support in ADO.NET and EF

Concurrency in EF

Out-of-box, EF has optimistic concurrency – last in wins
Can detect concurrency conflicts by adding concurrency token
EF, on update, delete, automatically adds concurrency check to where clause
If concurrency conflict, EF throws DbUpdateConcurrencyException

Resolve Concurrency Conflicts

Client Wins
- Out-of-box behavior
Server wins
- First in stays – your changes discarded
Notify user of conflicts
Code custom algorithm
- Let business decide
- EF has: 1) original data; 2) your changed data; 3) data currently in database

Demo – Concurrency

When you catch DbUpdateConcurrencException, good place to have strategy pattern
E.g. custom merge
GetDatabaseValues, OriginalValues, CurrentValues
Merge is sometimes possible (if we’re not changing the same thing)
On merge failure (true conflict), we log it

“Programming to DBContext” – Rowan Miller

Enterprise Apps are Performant

Performance in Apps

Defining performance
- Emergent property, blends latency and underlying infrastructure
Measuring performance
- Can’t improve what you don’t measure
- Business KPIs and application KPIs
- Use application KPIs to keep track of application metrics that are meaningful to you
- Who falls over first: business tier, database, etc.
Performance impactors
- Can buy your way out of bandwidth issues
- Can’t buy fix for latency
- Must be considered up front

Performance rules

Choose right architecture
Test early/often
Whenever you make a change, re-test

Performance Pillar – Efficient Queries

Default Query Behavior

LINQ query eventually transformed into SQL query
EF materializes untyped SQL result set into strongly-typed entities

Query Performance

Is Your SQL Efficient?
- Customers assume that LINQ is smarter than it actually is
- Looks like SQL but doesn’t necessarily act like SQL
- EF may not be efficient
Must Profile LINQ Queries
- Structure of LINQ expressions can have tremendous impact on query performance
- EF can’t determine query efficiency from LINQ expression
The Truth is in the SQL execution plan
- Actual, not estimated

In a perfect world, you’d profile all of your queries

Profiling Check List

Set Application Name Attribute in conn string
- Allows filtering just SQL from your application
Run SQL Trace
Run Query Analyzer
SET STATISTICS TIME ON/OFF
Review Actual Execution Plan

Demo – Profiling

Profile slow or complicated query
Tools | SQL Server Profiler
Start trace
Events Selection
- No Audit, ExistingConnection,
- Only show RPC Completed, SQL BatchCompleted
Column Filters
- Application Name like “EF” (or whatever you put in conn string)
Run
Run simple query
In Profiler, you see full SQL query
- Tells us what application is doing—just the query
Copy out of Profiler
New Query window
Paste in
SET STATISTICS TIME ON
- (and OFF at bottom)
Select database
Query | Include Actual Execution Plan
Get rows back, took 94 ms, then 35 ms
- SQL caching
Look at Execution plan
- Favor Seeks over Scans
- Clustered Index Scan
- Given structure of database, we can’t improve on it
Look at how we’d write query
- Query plan is the same on your own query
You really don’t know how query works until you look at Query Execution Plan
If you know of a better way to get to the data, you can use a Stored Proc

Can change underlying LINQ statement

Sometimes can find better way

What About Stored Procedures?

EF Loves to Generate SQL
- Default behavior – devs do LINQ, EF gens SQL
Sprocs always option
- Simple to map select/update ops to stored procs
- EF generates method from which to invoke
Consistency across EF
- Same experience
- Full change tracking
Right tool for the Job

“Don’t fear the SPROC”

Performance Best Practices

Profile early/often
- Slow operations of complex queries
Must Haves…
- Application Name conn string attribute
- Set Statistics Time On/Off
- Actual vs. Estimated Executed Plan
Don’t fear the Sproc
- Stored procs are first-class citizens in EF
- Customers often blend both EF queries and stored procedures in hybrid approach
- Common customer usage pattern for large application

80/20

80% CRUD simple queries
20% more complex
- Good place for stored procedure

Performance Pillar Asynchronous Operations

Async

EF 6 Exposes Asynchronous Operations
- Most IQueryable<T> methods now async-enabled
- Leverages recent Async/Await pattern
Fast and Fluid Experience
- Increase client responsiveness, server scalability
- Prevents blocking of main thread
- *** more ***

Demo – Async

Method that does async call, do await
In async function, return Task
await ctx.SaveChangesAsync()
FirstOrDefaultAsync

Performance Pillar: Caching

Performance Enhancements

Overall Query Construction
Performance Improvements
- EF 5 and EF 6 has big perf improvements
- And need .NET Framework 4.5+

Query Caching

Earlier
- Linq query
- Compiler builds expression tree
- EF parses tree and builds SQL query
- Repeated each time query executed
Now, Autocompiled Queries
- EF 5+, SQL query constructed once, stored in EF query cache and reused
- Parameterized – same query/different parameter values
- For your application, all user share this cache (App Domain)
- Supports up to 800 queries
- Uses MRU algorithm to decide which queries to drop
This is (obviously) different from SQL caching

Closing Thoughts

Step 0 – Learn the Framework

Framework is complex
Allocated time to learn
- Understand LINQ and subtle nuances of each
- Understand immediate vs. deferred execution
For Microsoft Premier customers, Premier EF training

Books

Programming Entity Framework
DbContext
Code First
Entity Framework 4.1: Expert’s Cookbook

http://msdn.com/

Entity Framework 6 Recipes – Rob Vettor

Session – Offline-Enabled Data Services

Posted by Sean

PDC 2008, Day #3, Session #4, 1 hr 15 mins

Pablo Castro

I attended a second session with Pablo Castro (the previous one was the session on Azure Tables). This session focused on a future capability in ADO.NET Data Services that would allow taking data services “offline” and then occasionally synching them with the online data.

Background – Astoria

ADO.NET Data Services was recently released as part of the .NET Framework 3.5 SP1 release. It was formerly known as project “Astoria”.

The idea of ADO.NET Data Services is to allow creating a data service that lives on the web. Your data source can be anything—a SQL Server database, third-party database, or some local data store. You wrap your data source using the Entity Data Model (EDM) and ADO.NET Data Services Runtime. Your data is now available over the web using standard HTTP protocols.

Once you have a data service that is exposed on the web, you can access it from any client. Because the service is exposed using HTTP/REST protocols, you can access your data using simple URIs. By using URIs, you are able to create/read/update/delete your data (POST/GET/PUT/DELETE in REST terms).

Because we can access the data using HTTP, our client is not limited to a .NET application, but can be any software that knows about the web and HTTP. So we can consume our data from Javascript, Ruby, or whatever. And the client could be a web-based application or a thick client somewhere that has access to the web.

If your client is a .NET client, you can use the ADO.NET Data Services classes in the .NET Framework to access your data, rather than having to build up the underlying URIs. You can also use LINQ.

So that’s the basic idea of using ADO.NET Data Services and EDM to create a data service. For more info, see:

ADO.NET Data Services home page on MSDN
ADO.NET Entity Framework home page on MSDN
Alex Barnett’s blog post with basic info on REST

The Data Synchronization Landscape

Many of the technologies that made an appearance at PDC 2008 make heavy use of data synchronization. Data synchronization is available to Azure cloud services or to Live Mesh applications.

The underlying engine that handles data synchronization is the Microsoft Sync Framework. The Sync Framework is a synchronization platform that allows creating sync providers, sitting on top of data that needs to be synchronized, as well as sync consumers—clients that consume that data.

The basic idea with sync is that you have multiple copies of your data in different physical locations and local clients that make use of that data. Your client would work with its own local copy of the data and then the Sync Framework would ensure that the data is synched up with all of the other copies of the data.

What’s New

This session talked about an effort to add support in ADO.NET Data Services for offline copies of your Astoria-served data, using the Sync Framework to do data synchronization.

Here are the basic pieces (I’m too lazy to draw a picture). This is just one possible scenario, where you want to have an application that runs locally and makes use of a locally cached copy of your data, which exists in a database somewhere:

Data mainly “lives” in a SQL Server database. Assume that the database itself is not exposed to the web
You’ve created a data service using ADO.NET Data Services and EDM that exposes your SQL Server Data to the web using a basic REST-based protocol. You can now do standard Create/Read/Update/Delete operations through this interface
You might have a web application running somewhere that consumes this data. E.g. A Web 2.0 site built using Silverlight 2, that allows viewing/modifying the data. Note that the web server does not have a copy of your data, but goes directly to the data service to read/write its data.
Now you create a thick client that also wants to read/write your data. E.g. A WPF application. To start with, you assume that you have a live internet connection and you configure the application to read/write data directly from/to your data service

At this point, you have something that you could build today, with the tools in the .NET Framework 3.5 SP1. You have your data out “in the cloud” and you’ve provided both rich and thin clients that can access the data.

Note: If you were smart, you would have reused lots of code between the thin (Silverlight 2) and thick (WPF) clients. Doing this gives your users the most consistent GUI between online and offline versions.

Now comes the new stuff. Let’s say that you have cases when you want your thick WPC client to be able to work even though the Internet connection is not present. Reasons for doing this include:

You’re working on a laptop, somewhere where you don’t have an Internet connection (e.g. airplane)
You want the application to be more reliable—i.e. app is still usable even if the connection disappears from time to time
You’d like the application to be slightly better performing. As it stands, the performance depends on network bandwidth. (The “lunchtime slowdown” phenomenon).

Enter Astoria Offline. This is the set of extensions to Astoria that Pablo described, which is currently not available, but planned to be in Alpha by the end of the year.

With Astoria Offline, the idea is that you get a local cache of your data on the client PC where you’re running your thick client. Then what happens is the following:

Your thick (WPF) application works directly with the offline copy of the data
Performance is improved
Much more reliable—the data is always there
You initiate synchronization (or set it up from time to time) to synch data back to the online copy

This synchronization is accomplished using the new Astoria Offline components. When you do synchronize, the synchronization is two-ways, which means that you update both copies with any changes that have occurred since you last synched:

All data created locally is copied up to the online store
Data created online is copied down
Changes are reconciled—two-way
Deletions are reconciled—two-way

Pablo did a basic demo of this scenario and it worked just as advertised. He showed that the client worked with the local copy of the data and that everything synched properly. He also showed off some early tooling in Visual Studio that will automate much of the configuration that is required for all of this to work.

Interestingly, it looked like in Pablo’s example, the local copy of the data was stored in SQL Server Express. This was a good match, because the “in the cloud” data was stored in SQL Server.

How Did They Do It?

Jump back to the description of the Microsoft Sync Framework. Astoria Offline is using the sync framework to do all of the underlying synchronization. They’ve written a sync provider that knows about the entity data model and interfaces between EDM and the sync framework.

Extensibility Points

I’m a little fuzzier on this area, but I think I have a general sense of what can be done.

Note that the Sync Framework itself is extensible—you can write your own sync providers, providing synchronized access to any data store that you care to support. Once you do this, you get 2-way (or more) synchronization between your islands of custom data.

But if I understood Pablo correctly, it sounds like you could do this a bit differently with Astoria Offline in place. It seems like you could pump your custom data from the Entity Framework, by building a custom data source so that the EDM can see your data. (EntityFrameworkSyncProvider fits in here somewhere). I’m guessing that once you serve up your data in a relational manner to the EDM, you can then synch it using the Astoria Offline mechanisms. Fantastic stuff!

Going Beyond Two Nodes

One could imagine going beyond just an online data source and an offline copy. You could easily imagine topologies that had many different copies of the data, in various places, all being synched up from time to time.

Other Stuff

Pablo talked about some of the other issues that you need to think about. Conflict detection and resolution is a big one. What if two clients both update the same piece of data at the same time? Classic synchronization issue.

The basic things to know about conflicts, in Astoria Offline, are:

Sync Framework provides a rich model for detecting/resolving conflicts, under the covers
Astoria Offline framework will detect conflicts
The application provides “resolution handlers” to dictate how to resolve the conflict
- Could be locally—e.g. ask the user what to do
- Or online—automatic policies

Pablo also talked briefly about the idea of Incremental Synchronization. The idea is that you might want to synch things a little bit at a time, in a batch-type environment.

There was a lot more stuff here, and a lot to learn. Much of the concepts just bubble up from the Sync Framework.

Takeaways

Astoria Offline is potentially game-changing. In my opinion, of the new technologies presented at PDC 2008, Astoria Offline is the one most likely to change the landscape. In the past, vendors have generally had to pick between working with live data or local data. Now they can do both.

In the past, the online vs. offline data choice was driven by whether you needed to share the data across multiple users. So the only apps that went with offline data were the ones that didn’t need to share their data. What’s interesting about Astoria Offline is that these apps/scenarios can now use this solution to leave their data essentially local, but make the data more mobile, across devices. Imagine an application that just stores local data that only it consumes. But now if you want to run that app on multiple machines, you have to manually copy the data—or move it to a share seen by both devices. With Astoria Offline, you can set up a sync to an online location that each device synchs to, as needed. So you can just move from device to device and your data will just follow you. So you can imagine that this makes it much easier to move apps out to mobile devices.

This vision is very similar to what Live Mesh and Live Services promise. But the difference is that here you don’t need to subscribe to your app and its data living in the MS Live space. Your data can be in whatever format you like, and nobody needs to sign up with MS Live.

When Can I Get It?

Pablo mentioned a basic timeline:

Early Alpha by the end of the year
CTPs later, i.e. next year

Resources

In addition to the links I listed above, you might also check out:

Project Astoria Team Blog
Design Walkthrough of Astoria Offline Alpha Preview

Sean’s Stuff

Learning new software development technologies out loud

Entity Framework

That Conference 2018 – Know What Your Code Is Doing to SQL Server

String Properties in ADO.NET Entity Data Model

That Conference 2016 – What’s that Smell? It’s your Entity Framework!

TechEd NA 2014 – Industrial Strength Entity Framework

Session – Offline-Enabled Data Services