In a previous post here I detailed a mechanism for using structs as document keys in RavenDB. This is useful for simplifying a domain model, especially where a meaningful identifier already exists out with the application. In this post, however, I want to explore some of the potential performance issues with this approach if not used carefully/sparingly. This post starts by detailing steps that can be taken to maximise performance when working with structs in general, and the goes on to highlight some unavoidable performance costs that you should be aware of with non-string identifiers in RavenDB when deciding whether or not to take this approach.
Maximise Performance with Structs
Before we look at how RavenDB deals with non-string identifiers, I wanted to outline some important rules to follow when working with structs. Structs can improve performance when used correctly, for example they are generally stored on the stack rather than the heap (although not always) and therefore carry less overhead and won’t invoke garbage collection when destroyed. However if they are used incorrectly they can actually damage performance.
When working with structs, do:
- Implement IEquatable<T> – implementing this interface you will help avoid the overhead of boxing/unboxing-copying when checking equality
- Override Equals() and GetHashCode() – although these virtual methods are overridden in structs by default, for improved performance you should provide your own implementation
- Use StructLayoutAttribute – this defines how the CLR orders fields in memory. LayoutKind.Sequential is the default for structs, but unless you will be interoperating with unmanaged code you should use the LayoutKind.Auto to improve performance
- Use ValueType.ToString() – provide an implementation of ToString(). As an added bonus the JIT compiler is clever enough to emit code which executes this non-virtually
- Keep size below 16 bytes – although this is less of a concern if the struct won’t be passed too or returned from a method, i.e. avoiding field copying overheads
Don’t:
- Use interfaces lightly – interfaces are reference types, so if you cast your struct to an interface type you will unwittingly be boxing
- Call into base classes when overriding virtual classes – this will often result in boxing
- Cast structs as objects – unless you can avoid it… this will result in boxing
Scenario
Now onto how using structs as identifiers in RavenDB. To understand more about how RavenDB handles struct identifiers, lets consider a simple scenario. We have a BankAccount reference type where the document key is a struct of type AccountIdentifier. The AccountIdentifier simply encapsulates both the account number and sort code. In a basic Console app we are going to:
- Instantiate a new BankAccount (reference object) with an AccountIdentifier id (struct)
- Pass the BankAccount to the Store(dynamic entity) method of the RavenDB DocumentSession
- Call SaveChanges to persist the object
- Reload the object using Load<T>(ValueType id)
- Make a change to the BankAccount object
- Save the changes by calling SaveChanges()
We will then look at the number of boxing/unboxing-copying actions taking place which wouldn’t have occurred when using regular string identifiers (remember that these findings also hold for int or Guid identifiers!).
The Code
The code below outlines the basic flow of the scenario.
static void Main(string[] args) { var account = new BankAccount { Id = new AccountIdentifier(sortCode, accountNumber), AccountHolder = "Gary" }; using (var session = SessionFactory()) { session.Store(account); session.SaveChanges(); } using (var session = SessionFactory()) { var loadedAccount = session.Load<BankAccount>(new AccountIdentifier(sortCode, accountNumber)); loadedAccount.AccountHolder = "Gary Crawford"; session.SaveChanges(); } }
In order for RavenDB to know how to handle the non-string identifier we must provide an implementation of ITypeConverter to define how the identifier works. Note that this is a key area for additional boxing/unboxing-copying.
using System; using Raven.Client.Converters; public class AccountIdentifierConverter : ITypeConverter { public bool CanConvertFrom(Type sourceType) { return sourceType == typeof(AccountIdentifier); } public string ConvertFrom(string tag, object value, bool allowNull) { var identifier = (AccountIdentifier)value; if ((identifier.SortCode == 0 && identifier.AccountNumber == 0) && allowNull) { return null; } return string.Concat(tag, identifier.ToString()); } public object ConvertTo(string value) { var values = value.Split(new char[] { ':' }); var sortCode = Convert.ToInt32(values[0]); var accountNumber = Convert.ToInt32(values[1]); return new AccountIdentifier(sortCode, accountNumber); } }
Findings
This relatively simple scenario of creating then saving an object, re-loading it to make changes before persisting those changes results in no less that 12 additional boxing/unboxing-copying when using a struct identifier over string identifiers. This may be an acceptable overhead for simple, once per visit, queries (e.g. authentication only occurring at the start of a session) but it is a performance overhead that could very quickly grow if used iteratively.
To highlight where these boxing/unboxing-copying overheads come from I have detailed the RavenDB Store, SaveChanges and Load<T> processes in UML sequence diagrams, noting the occurrences in red with a side note. You can access these with the links below. (Please note, these diagrams only highlight the parts of the flows immediately concerned with ValueType identifiers – there is a lot more going on which I have excluded for the purpose of this post).
- Click here for the Store(dynamic entity) sequence diagram.
- Click here for the SaveChanges() sequence diagram.
- Click here for the Load<T>(ValueType id) sequence diagram.
For clarity, I have also included screenshots of the IL generated from the simple console app below. The diagram immediately below highlights the unboxing and copying IL instructions generated by the compiler from the C# code listed above. This IL specifically deals with the ConvertFrom method in the AccountIdentifier struct. The unboxing and copying is a result of the struct being passed into the method as an object, and thereby having been previously boxed.
The diagram below highlights the boxing occurrence created by the compilation of the ConvertTo method in the AccountIdentifier. The boxing is a result of the object return type on the method.
The diagram below is the IL for the Main method of the console app, and clearly shows the boxing which occurs to Load the persisted entity from RavenDB.
Summary
It’s important to adhere to certain rules when working with structs in order to maximise performance, but when used correctly they are an exceptionally useful tool in any developers bag.
Using a struct as an identifier for an object in to be stored in RavenDB has some nice advantages in terms of domain model simplifications, however this must always be considered alongside the additional, and costly, number of boxing and unboxing-copying operations which will result. In particular this shouldn’t be used iteratively – one off operations are much better candidates for this mechanism. It is worth highlighting again, however, that all the findings in this post apply equally for using int or Guid identifiers with RavenDB – not just custom structs.
Happy coding!
The post Value Objects as Document Keys in RavenDB: Performance Considerations appeared first on Passion for code.