Rules to Better Entity Framework - 16 Rules
Entity Framework allows you to provide a strongly typed object framework (ORM) over your database. This helps abstract the database away allowing it to be replaced with other technologies as needed.
using (SqlConnection conn = new SqlConnection()) { conn.ConnectionString = "Data Source=(local);Initial Catalog=Northwind;Integrated Security=True"; conn.Open(); SqlCommand cmd = conn.CreateCommand(); cmd.CommandText = "SELECT * FROM Customers WHERE CompanyName LIKE '" + companyNameTextbox.Text + "%'"; bindingSource1.DataSource = cmd.ExecuteReader(); }
Figure: Bad example - using ADO.NET and not strongly typed
var results = dbContext.Customers .Where(c => c.CompanyName.StartsWith(companyNameTextbox.Text)); customersBindingSource.DataSource = results; // Or even var results = dbContext.Customers .Where(c => c.CompanyName.StartsWith(companyNameTextbox.Text)) .Select(c => new { c.CompanyName, c.Phone }); customersBindingSource.DataSource = results;
Figure: Good example - at least 4 good reasons below
- Making queries that are independent from specific Database engine
- Strongly typed fields - SQL tables/entities have intellisense
- More readable and less code
- It's easy to chain more operation like
OrderBy
,GroupBy
,FirstOrDefault
,Count
,Any
and many more - Developers are generally poor at writing SQL, so using code constructs makes writing queries much easier
It's expensive retrieving data from a database, as such it's important to only ask for the rows you require when getting data.
Entity Framework nicely translates the various filters like the Where method into SQL WHERE clauses. This makes it really easy to write nice readable code that is also very efficient.
List<Sale> sales = context.Sales.ToList(); foreach (var sale in sales) { if (sale.ProductId == request.ProductId) { // Do stuff } }
Bad example - Retrieved all the data instead of items that matched the product id.
List<Sale> sales = context.Sales .Where(sale => sale.ProductId == Request.ProductId) .ToList(); foreach (var sale in sales) { // Do stuff }
Good example - Only the data required was retrieved from the database
When you cast IQueryable to IEnumerable and then query the data from there, Entity Framework must collect the data at the point you do the cast. This can result in very significant extra database load, and extra processing on the client side.
NOTE: Using
.AsEnumerable()
achieves the same effect.Counting
// All examples below will result in a SQL query similar to: // SELECT * FROM Sales // The ToList generates a list of all records client side and then counts them. int count1 = context.Sales .ToList() .Count(); // This implicitly treats the sales as an enumerable and enumerates all the items to count them. IEnumerable<Sale> sales = context.Sales; int count2 = sales.Count; // EF Core will evaluate everything before `.AsEnumerable()` and after that line, everything is in-memory. int count3 = context.Sales .AsEnumerable() .Count(); // This is the most common source of `IEnumerable` casting which can cause significant performance issues. public IEnumerable<Sale> GetSales() => context.Sales; // The code on the first glance looks alright but in fact it fetches the entire table from SQL Server // because it receives the query as `IEnumerable` before running `.Count()`. int count4 = GetSales().Count();
Bad example - All these examples read the entire table instead of just returning the count from the database.
// All of the examples below will result in SQL query: // SELECT COUNT(*) FROM Sales int count1 = context.Sales.Count(); IQueryable<Sale> query = _context.Sales; int count2 = query.Count(); public IQueryable<Sale> GetSales() => context.Sales; int count3 = GetSales().Count();
Good example - Only the count is returned by the query
Where
// All of the examples below will result in a SQL query like: // SELECT * FROM Sales List<Sale> sales1 = context.Sales .AsEnumerable() .Where(x => x.Id == 5) .ToList(); private IEnumerable<Sale> Sales { get { return context.Sales; } } List<Sale> sales2 = Sales .Where(x => x.Id == 5) .ToList();
Bad example - The whole table is returned from the database and then discarded in code.
// All Examples will result in a SQL query like: // SELECT * FROM Sales WHERE Id = 5 List<Sale> sales1 = context.Sales .Where(x => x.Id == 5) .ToList(); private IQueryable<Sale> Sales { get { return context.Sales; } } List<Sale> sales2 = Sales .Where(x => x.Id == 5) .ToList();
Good example - Filtering is done on the database before returning data.
One of EF Core's best features is the fact it tracks any changes you make to entities after you retrieve them. However this comes with a cost, if the data is only being read and the returned entities will never be modified then you can use the AsNoTracking method to inform EF Core not to bother tracking changes.
This results in fairly significant memory and CPU improvements on the client side.
return context.Sales.AsNoTracking().Where(x => x.Id == 5).ToList();
Figure: Using AsNoTracking to save CPU and memory for a read only query
When retrieving data it's much more efficient to only collect the data you need. It saves computation and IO on the database and also saves memory and CPU on the calling side.
IEnumerable<string> GetProductGuids(string category) { IEnumerable<Product> products = context.Products .Where(x => x.Category == category) .ToList(); return products.Select(x => x.ProductGuid); }
Figure: Bad example - Retrieved the whole product record when we only needed 1 property
IEnumerable<string> GetProductGuids(string category) { IEnumerable<string> productGuids = context.Products .Where(x => x.Category == category) .Select(x => x.ProductGuid) .ToList(); return productGuids; }
Figure: Good example - Retrieved only the required property.
The Update method on an entity in EF Core marks all of its fields as dirty. This will result in all the fields being written back to the database.
Writing the entire record to the database can cause locking issues in the database server if there are foreign key relationships involving the entity being modified.
var entity = context .Products .FirstOrDefault(item => item.ProductID == id); if (entity != null) { entity.Name = "New name"; context.Products.Update(entity); context.SaveChanges(); }
Figure: Bad example - The whole record is written back to the database.
var entity = context .Products .FirstOrDefault(item => item.ProductID == id); if (entity != null) { entity.Name = "New name"; context.SaveChanges(); }
Figure: Good example - Only the modified fields are written back to the database.
Often developers will include all the related entities in a query to help with debugging. Always remember to take these out. They cause excessive database load.
If you need the related entities, then that is what Include is for.
var query = _dbContext .Sales .Include(x => x.SalesPerson);
Figure: Bad example - Retrieved the sales records and the salesperson, even though we don't intend to use the salesperson record.
var query = _dbContext .Sales;
Figure: Good example - Retrieved only the sales records themselves
Pagination can be expensive if all the pages are retrieved from the database before grabbing the relevant page. It's much more efficient to get only the page number requested back from the database.
var query = context .Sales .AsNoTracking() .Where(x => x.SalesPersonId == salesPersonId); var result = await query.ToListAsync(); int count = result.Count; result = result .Skip(page * pageSize) .Take(pageSize); return (count, result);
Figure: Bad example - Reads all the data from the database, counts the records and filters down to the page
var query = context .Sales .AsNoTracking() .Where(x => x.SalesPersonId == salesPersonId); int count = await query.CountAsync(); query = query .Skip(page * pageSize) .Take(pageSize); var result = await query.ToListAsync(); return (count, result);
Figure: Good example - Reads only the count and 1 page of data from the database
TagWith adds comments to the generated SQL. This makes it easier to identify queries when they run on the database.
This is very useful when debugging issues as there are often multiple pieces of code that generate similar statements and as such it's hard to identify what is executing particular queries.
var list = await context .Sales .TagWith("Get All Sales") .ToListAsync(ct);
Figure: Code to add tagging
-- Get All Sales select * from sales
Figure: SQL generated by the above code
To avoid embarrassing failures in Production, it is important to ensure that your development systems are as similar as possible to what's expected in Production.
Modifying and querying database tables is very dependent on the amount of data in the table. Often developers will run their code in a database without sufficient data in the tables and therefore the queries are nice and fast. The problem is when there's millions of transactions already in the database, all the queries turn out to be far too slow.
So it is an important part of the development process to seed your development databases with a reasonable amount of representative data.
Benchmarking your system's performance is important. This is making sure you have information on how your system performs with a known set of data and load.
Benchmarking allows you to then optimize code and actually know that things improved.
There are plenty of good benchmarking tools for .Net solutions.
- BenchmarkDotNet is good because it monitors memory consumption and timings.
- Bombardier is a simple CLI load testing tool.
- NBomber is good for automating load tests
- RedLine13 uses AWS spot instances to provide really cheap enormous scale for load testing
Try these out and there are more available. Which one suits will depend on your solution and what information you want.
Databases are slow at doing bulk updates. It's generally significantly faster to break bulk processing down into manageable chunks. It also avoids other database users experiencing significant blocking issues.
Linq include the Chunk method to make this process nice and easy.
var productIds = context.ProductIds; foreach(var chunk in productIds.Chunk(10)) { // Do stuff }
Raw SQL comes with risks but sometimes it is the best solution.
Using raw SQL involves taking care of SQL injection and other risks, however there are a number of situations where it may be the best solution.
The most obvious is a SQL UPDATE statement which updates a large number of rows.
await context.Database.ExecuteSqlInterpolatedAsync($"UPDATE Employees SET Active = {activeState}", ct);
Good example - Updating a large number of rows quickly with SQL
TODO: Byrden
Bad example - Don't modify database directly
Good example - modify entities and generate a migration
When testing code that depends on Entity Framework Core, the challenge often lies in how to effectively mock out the database access. This is crucial for focusing tests on the functionality surrounding the DB access rather than the database interactions themselves. The EF Core In-Memory provider is a tool designed to address this need.
Common Pitfalls in Mocking
Trying to Mock
DbContext
Attempting to mock the entire
DbContext
is a common mistake. This approach typically leads to complex and fragile test setups, making the tests hard to understand and maintain.var mockContext = new Mock<ApplicationDbContext>(); // Adding further mock setups...
Figure: Bad Example - Mocking the entire DbContext is overly complex and error-prone.
Trying to Mock
DbSet
Similarly, mocking
DbSet
entities often results in tests that don't accurately reflect the behavior of the database, leading to unreliable test outcomes.var mockSet = new Mock<DbSet<MyEntity>>(); // Configuring mockSet behaviors...
Figure: Bad Example - Mocking DbSet entities fails to mimic real database interactions effectively.
Good Practice: Using DbContext with In-Memory Provider
Instead of extensive mocking, using
DbContext
with the EF Core In-Memory provider simplifies the setup and reduces the need for mocks. This approach enables more realistic testing of database interactions.var options = new DbContextOptionsBuilder<ApplicationDbContext>() .UseInMemoryDatabase(Guid.NewGuid().ToString()) .Options; var dbContext = new ApplicationDbContext(options);
Figure: Good Example - Using DbContext with an EF Core In-Memory provider for simpler and more effective testing.
Caveat: Limitations of In-Memory Testing
While the EF Core In-Memory provider is useful for isolating unit tests, it's important to recognize its limitations:
- Behavioral Differences: It doesn't emulate all aspects of a SQL Server provider, such as certain constraints or transaction behaviors.
- Not Suitable for Query-focused Tests: For tests that focus on EF queries, more realistic results can be achieved through integration tests with an actual database.
Checkout JK's EF Core Testing Repository for comprehensive examples and advanced scenarios in EF Core testing.