Skip to content

Building a Big Data Platform for Audience Measurement

Client background

A large international TV and broadband company, with operations in +10 European countries. They invest in the infrastructure and digital platforms that empower their customers to make the most of the video, internet and communications revolution.

They develop products delivered through fibre-based networks that connect millions of customers subscribing to  millions of TV, broadband internet and telephone services and also serve lots of mobile subscribers.

The challenge: A future ready Big Data Solution

This company is receiving a lot of data from set-top boxes for Linear TV. They combine this data with channel and schedule information from other sources and formats to get insight in user activity. they had a Data Warehouse solution that was not performing fast enough, nor was it future ready for the anticipated excess processing. Daily reports took hours to be processed. They decided to implement a full Big Data solution, based on the Hadoop platform and technologies. Devoteam’s challenge was to (re)build a new big data platform and use or improve the logic for data transformation that was used in the legacy.

viewers-linear-tv-audience-measurements

Example data of viewers of linear TV

The solution: A Hadoop Big Data Platform with improved business logic

This project consisted of two parts: 1) building the big data platform and 2) implementing the business logic for data.

The Big Data Platform

Devoteam built a big data platform that consisted of several layers. In the first layer, Data & Privacy Gateway, the data from the set-top boxes was ingested with Kafka after which it was anonymized and sent to the Landing Zone of the Azure Data Lake using Apache NiFi. In the Landing Zone the data still was in raw format. Next, the data was cleansed and uniformed into Avro format and sent to the Azure Curated Data Lake. The next layer was the Business Repository in which the business logic for data transformation was applied. The final layer was the consumption layer in which advanced analytics could be done. The Landing Zone, Data Lake and Business Repository were running on a Hadoop environment. The Hadoop Jobs were managed by Oozie and the data pipelines were orchestrated in Oozie. The solution provided by Devoteam significantly reduced the processing time of the whole workflow.

audience-measurements

Improving Business Logic

The business logic used in the business repository layer was extracted from the old legacy. However, Devoteam found some flaws in the old code and proposed ways to improve the business logic resulting in higher data quality.

Devoteam’s added value: Technology for people

Our solution significantly reduced the processing time. Furthermore, we improved the logic for data transformation and therefore the quality of the data. The improved data quality enables our client to better serve the end-users by providing better fitting recommendations (in the mobile/web application), adding improved personalization aspects to linear TV.

Powering business decisions with data & Analytics

Being more effective and making quicker decisions with real-time data analysis and an integrated data strategy based on data quality.