Processing partitions \ tables in parallel

One thing that has come up several times in the last few weeks is a misconception that you cannot process multiple partitions on the same table or tables in parallel as it would cause a lock. This could be true if you try to do the parallelism yourself. Like in SQL Server you need to think about transactions, the AS engine is a transactional system too.

So, the AS engine is definitely capable of loading data in parallel but only if you let him do the puzzling on concurrency. This means you must send processing commands to the AS engine in one transaction so the AS engine can manage the locks itself. There are other benefits of letting AS doing the work like recalculating the calculated items (tables, columns etc) once instead of multiple times which improves processing performance.

Let’s look at an example. Here I have a table with 3 partitions:

If I would process them in separate commands, it might run into concurrency issues like a lock. But the key here is to process them in one command like this:

{
  "refresh": {
    "type": "automatic",
    "objects": [
      {
        "database": "TabularProject1_kadejo_e73a0d6f-4047-409a-b407-a6354b83b0d0",
        "table": "Sales",
        "partition": "FactOnlineSalesVBig - Copy"
      },
      {
        "database": "TabularProject1_kadejo_e73a0d6f-4047-409a-b407-a6354b83b0d0",
        "table": "Sales",
        "partition": "FactOnlineSalesVBig - Copy - Copy"
      }
    ]
  }
}

Now the AS engine will process them in parallel as much as possible. The key here is to provide all the elements you want to process in the objects array. This will allow the engine to run it in parallel and figure out how to do that.

Now this object array is not just for partitions of the same table. I can also add other objects in the same model, like a different table. Again, it will run the refresh in parallel as much as possible.

{
  "refresh": {
    "type": "automatic",
    "objects": [
      {
        "database": "TabularProject1_kadejo_e73a0d6f-4047-409a-b407-a6354b83b0d0",
        "table": "Sales",
        "partition": "FactOnlineSalesVBig - Copy"
      },
      {
        "database": "TabularProject1_kadejo_e73a0d6f-4047-409a-b407-a6354b83b0d0",
        "table": "Sales",
        "partition": "FactOnlineSalesVBig - Copy - Copy"
      },
      {
        "database": "TabularProject1_kadejo_e73a0d6f-4047-409a-b407-a6354b83b0d0",
        "table": "DimDate"
      }
    ]
  }
}

Now if you DO NOT want them to run in parallel or you want to other operations as well you can use the sequence command. Here you can specify the maxParallelism on how much it can run in parallel . This would look like this:

{
"sequence": {
    "maxParallelism": 1,
    "operations": [{
        "refresh": {
            "type": "automatic",
            "objects": 
            [
                {
                    "database": "TabularProject1_kadejo_e73a0d6f-4047-409a-b407-a6354b83b0d0",
                    "table": "Sales",
                    "partition": "FactOnlineSalesVBig - Copy"
                },
                {
                    "database": "TabularProject1_kadejo_e73a0d6f-4047-409a-b407-a6354b83b0d0",
                    "table": "Sales",
                    "partition": "FactOnlineSalesVBig - Copy - Copy"
                },
                {
                    "database": "TabularProject1_kadejo_e73a0d6f-4047-409a-b407-a6354b83b0d0",
                    "table": "DimDate"
                }
            ]
        }
	}]
	}
}

This works on all types of the SSAS engine like AS on premise, Azure AS and Power BI datasets.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.