Sometimes, seemingly simple loops may hide memory consumption bugs. Let’s look at the following C# code snippet that’s responsible for doing maintenance on a list of users.
long
[] userIds = GetUserIdsForMaintenance();
using
(DbContext dbContext =
new
DbContext())
{
foreach
(
long
id
in
userIds)
{
User user = dbContext.GetUser(id);
// ... Do maintenance on user ...
}
}
As implied, each dbContext.GetUser(id) creates a DB call that fetches a User. Many popular O/R Mapping frameworks, such as Entity Framework or NHibernate, utilize a caching mechanism when fetching entities from the DB, so in our example all the fetched Users might be cached by the framework in its first-level cache (More about first-level caching: Entity Framework, NHibernate).
When our userIds list is very long, this cache can quickly fill up to a point where we run out of memory and receive an OutOfMemoryException.
How Bucketizing can help memory issues
One way to avoid these memory issues without turning off the caching feature is to periodically clear the cache before it fills up.
An easy way to do that would be to split our userIds into buckets and for each bucket to initialize a new DbContext instance:
- IEnumerable userIds = dbContext.GetAllUserIds();
- foreach (IEnumerable idBucket in userIds.Bucketize(5000))
- {
- using (DbContext dbContext = new DbContext())
- {
- foreach (long id in idBucket)
- {
- User user = dbContext.GetUser(id);
- // … Do maintenance on user …
- }
- }
- }
What we see here is a new extension method called Bucketize that splits the long userId list into buckets, each containing 5,000 IDs.
When handling each bucket, we are creating a new instance of DbContext. This effectively clears the cache of the old DbContext instances by letting the garbage collector collect the entire object and free all of its memory.
What does Bucketize code looks like?
public
static
IEnumerable
> Bucketize( this
IEnumerable vals,
int
bucketSize)
{
var
currentList =
new
List();
foreach
(
var
element
in
vals)
{
if
(currentList.Count == bucketSize)
{
yield
return
currentList;
currentList =
new
List();
}
currentList.Add(element);
}
if
(currentList.IsEmpty())
{
yield
break
;
}
yield
return
currentList;
}
As you can see, Bucketize is an extension method for IEnumerable which utilizes the yield keyword in order to retrieve the next bucket when needed, and not iterate on the entire collection.
“Bucketizing” large data collections can help us overcome memory issues that are sometimes hidden behind seemingly simple-looking loops.