We recently released the first version of an operational, near real time, BI platform for monitoring packages in a package sorter for a dutch company. This blog post will describe how we solved their problem using Reporting Services, Integration Services, ASP .Net, hosted in the cloud using Amazon Elastic Compute Cloud.
The scenario
There are 4 sorting centers in the Netherlands where parcels are collected from the region where it is in. Parcels receives a unique code (Bar code) on collection. The sorting centre then has to determine by postal code the destination of a parcel, parcels are eventually delivered by regional delivery stations. The Parcels are grouped by the sorting centre and during the night shipped to the delivery stations, the delivery station that is the farthest away gets its parcels first. When a parcel destination is closer then another sorting center it is send to that sorting center first, so parcel not within the range of the sorting center are coarsely sorted by region and parcels that are in range are finely sorted by street. Parcels are loaded upon a assembly line of a sorter machine in the sorting center, the postal code is scanned and translated to a digital postal code that will be attached to the bar code. The sorter then determines what the destination assembly line is, called chutes.
The current information is brought by reports full of numbers that are hard to interpret and an excel sheet full of VBA code that is mailed around the organisation. Each sorting centers hosts it’s own server running Reporting services 2000 and a Visual Basic service.
The challenge
The challenge the client had for us was:
- Consolidate data from different sources on one report
- Replace local machines with one central machine
- Bring the right data to the right people, data from sorting center to a manager, data from a team to a team lead
- Show the data as soon as it is available, near real time BI
- The data has to be extendable with data from other sources
- New ad-hoc reports have to be created in a short development cycle
How we did it: Project Startup
The project started with some brown paper session with different users from the organisation. In these sessions we determined the essential data needed to manage the sorting centers. Focus of the first release was replacing the current reports and excel sheet with the new reporting environment. The data will be delivered to our system by text files, every minute new files will be created and have to be imported and transformed by the application, estimate is 4 files per minute.
After designing reports for the different user groups, determining the MoSCow and analyzing the different input files we decided for the following architecture:
- Loading and transforming data with SSIS
- Database with SQL Server 2008, data in report tables with different aggregation levels, some data is needed at the lowest grain, but most data is over a larger time dimension.
- Reports with Reporting Services 2008
- Reporting portal with ASP.Net 3.5, reportviewer control and ASP.Net dynamic data for master data tables
While brainstorming with the client it appeared they would like to host the system outside their own infrastructure, we already had some experience with hosting at the Amazon Elastic Compute Cloud and decided to try to determine the feasibility of using Amazon for our bi system. Since the data is delivered in small files every minute a solution was quickly found. Servers hosted in the cloud files can fetch files as easily as local servers. The server in the cloud fetches data using SFTP. The Amazon cloud is very flexible that let you quickly scale capacity, both up and down, as your computing requirements change.
Implementation
Below is a graphical representation of the architecture used:
The application consists of the following steps:
- Data from the system is enriched with describing master data, this describing data is stored in a relational datamodel that can be maintained by pages that are generated by ASP.Net dynamic data
- Files are transferred to the system by a SSIS package that calls a SFTP application that will move (copy and remove) the files to the BI server, this package is scheduled every minute.
- Files are loaded by two SSIS packages per filetype. One package fetches all files currently in a configured directory using a for loop that traverses the directory. These files are passed through to a package that will import the file into the system. The data from the file is loaded into the SSIS memory and enriched using the lookup component, when describing data is not found these rows will be written to an error table. The remaining data will be written to a table, with data on minute level. These filewatching packages are scheduled to run every minute. The current system performs excellent with 30 files in one minute.
- A lot of data is added to the lowest grain table so to keep performance acceptable some aggregation is applied. Every minute a SSIS package is scheduled that selects the last 2 hours of data from minute table into a temporary table grouped by hour and summing the measures. The data is then merged into an Hour aggregation table using the SQL Server 2008 MERGE statement, with an update of the measure when the row existed and insert when not. This made sure the data is incrementally increased by each run.
- Reports also show live data from the last 15 minutes so a package is run every minute that fetches data from the last 15 minutes and places it into a table, this table always show 15 minutes of data.
- There are different reports, one for each user in the organisation. The sorting center manager gets a report showing all teams at work, he can then click on a team and drill down to a report with data of that team.
- Users who have opened the report want to see the new data loaded, we used the AutoRefresh function in RDL to automatically refresh the report every minute
Finally a screenshot of one of the SSRS reports hosted in the ASP.Net application:
Overall a great project to work on and we are very pleased with the result 🙂 I hope you get a good idea of how we used the MS BI stack to create this operation BI tool.