I have been playing with a model that barely fits on my 4 GB VM (on purpose to test something). 1 GB is already taken by other models and OS. I imported 6 identical tables that contain 2 million rows in my model using SSDT one table at the time.
I had no problem here, the model took 2.5 GB of memory. I then deployed the model to the server without processing it, just metadata. I then closed SSDT, that would unload the SSDT workspace and free up the memory. So now I have 3 GB free and a non processed database on the server.
I now processed the database using process full. To my surprise it failed processing with out of memory .. what happened here ? Why did my model fit while I was building it but now while I am processing it?
The difference here is that during processing of a database it processes all tables in parallel instead of per table what I did during importing. The key to understanding why this would cause memory issues is understanding what is going on during processing. When a table is processed it has several steps:
- Read and encoding the data into memory
- Compress the data
- Calculate the relationships, calc columns, hierarchies etc.
So as you can see the first step is reading and encoding all the data, at this point it is not compressed yet. Now if you have multiple tables processing at once it will read the data for all these columns into memory all at the same time, uncompressed. This makes that you need a much larger amount of memory on the server during processing than when the actual database is processed. What is worth to point out is that our engine actually reads the data in segments, 8 million rows per pass, before passing it on to be compressed, in my case it didn’t matter since my table was only 2 million rows.
Now that you know this its pretty easy to come up with a alternative processing method, you can process each table separate instead of in parallel, that way the memory will be cleared before processing the second table. In my example this allowed me to process the database without any memory pressure.
If you want to know more on what is going on during processing check out this awesome session by my colleagues Ashvini and Allan: http://channel9.msdn.com/Events/TechEd/NorthAmerica/2012/DBI414
I am trying to determine when I will run out of memory on my server. Currently I have no problems with space but would love to forecast out the next n+ years based on historical growth. What object would I go about watching? How do I know that if I process x amount of data that I will approach or get the out or memory error? When restoring a model, does the indicator change?