Skip to content

Azure Data Lake: Grow Fearless – Part 2

 

In my previous blog “Azure Data Lake: Grow Fearless – Part 1” I described the business benefits of data growth, but also the challenging parts of fast-growing data. In this blog, I’ll dive deeper and show you the promising parts of Azure Data Lake and why it’s worth considering.

Why should you consider Azure Data Lake?

Scalability

  • Unlimited storage empowers rapid business growth.
  • Easy-to-scale prepare business for any big leap in development.

Flexibility

  • ADLS Gen2 can adapt to any data format, structured, semi-structured, and unstructured.
  • Data can be stored in raw format and can be explored in its native feature without the traditional ETL process.
  • ADLS Gen2 can easily connect with other Azure components. It can serve source data to data warehouses or databases and act as a landing area for transformed data from Azure Data Factory and Azure Databricks. Additionally, PowerBI can connect directly to it for analysis.

High availability

There are always multiple copies of data in the ADLS Gen2 to be fully prepared for any unexpected failures and disasters. There are always three copies of data within a single region. Azure storage redundancy provides local to global solutions in the protection of local, zone, or regional outages or disasters. Besides disaster recovery, those replicates can also be used to repair corrupted data to guarantee data integrity.

Security

  • Auditing: diagnostic logging can be enabled to record data access traces.
  • Access control: ADLS Gen2 provides access control on individual files and folders and integrates with Azure Active Directory for identity and access management. Permissions can be given specifically to someone on certain files or folders. It is also possible to have multi-factor authentication, role-based access control (RBAC), monitoring, and alerting.
  • Encryption: data stored in ADLS Gen2 are encrypted by default (encryption-at-rest). Microsoft can manage the encryption key for you and you have the choice to manage it yourself. It is also possible to configure a secure transfer and enable encryption-in-transit.

Better Performance

  • Query performance: ADLS Gen2 has a hierarchical file system. This feature not only enables granular security mentioned above but also partitions. Therefore, if you connect with any software that can perform partition scan, the query performance will be highly improved.
  • Data load: ADLS Gen2 is able to relocate data through metadata-only operation, which is easy and cost-efficient.

1-click data platform brochure cover

Tired of big investments and complex timelines when it comes to setting up a data platform? We will set up your complete data platform infrastructure fully automated in just one click. 4x faster than usual and 3x lower platform enablement costs.

How to Load Data to Azure Data Lake

For loading data into Azure Data Lake, there are different ways for different situations. The following table shows some use cases and tools that can be used.

how to load data to azure data lake

Data quality

It is one of our main priorities to guarantee accurate and relevant data to be provided to the business on time for decision-making. Together with other services, such as Azure Synapse Analytics, Azure Databricks, and Azure Data Catalog, ADLS Gen2 supports the realization of better data quality, including accuracy, timeliness, integrity, and relevance.

Although unlimited storage of different types of data is impressive, unorganized storage of a large amount of data can be a disaster. It is recommended to follow the best practices to maintain a good data lake instead of turning it into a data dump.

  • Have a data storage plan before data loading can be a good start. We can organize Data Lake into layers and folders to serve different use cases. For example, we can organize data into the raw data layer, ETL layer, and consumption layer. Data can further be organized per client, source channel, year, and month.
  • Using Azure Data Catalog to make data easy to search and recognize.
  • Purge duplicated and unneeded data. Periodically examine if data need to be reorganized to fit business development and adjust if necessary.

Security

As above-mentioned, ADLS Gen2 supports auditing, access control, and encryption. Data professionals can utilize those features by following best practices.

  • Analyze audit logs periodically to control quality and identify risks.
  • Apply granular access control per folder/file and role. Update access control on time in case of personnel or position changes.
  • Rotate the encryption key periodically.

Lifecycle management

Manage and consume data in an efficient way is what we want to achieve throughout the whole life cycle. ADLS Gen2 provides different access tiers including hot, cool, and archive with decreasing cost rates and increasing retrieving time. Therefore, we can store data more efficiently based on access frequency. ADLS Gen2 also has a lifecycle management feature itself which gives you the possibility to schedule rules to move data to the right tier automatically.

Do you want to learn more about Azure Data Lake? Click on the button below and watch my recorded webinar “Data Lake – Grow Fearless”. Please contact me if you have any questions.