The Oracle Friday Tech Update: Big Data Management!

The Oracle Friday Tech Updates are an initiative from Oracle and are educational, practice-focussed sessions held on Friday. Every session offers an interesting close-up on a subject within Oracle’s portfolio.

On the 7th of October 2016, I have attended the Oracle Friday Tech Update with the subject: Big Data Management. During this day, Oracle provided an overview of a modern information management platform. All subjects were addressed by following the Information Management and Big Data Reference Architecture created by Oracle. This architecture was first published in 2014 and yet is still relevant and up-to-date.

Components of this Oracle Big Data Management architecture are:

1. Fast Data

Focusses on streams of data to identify actionable events and then determine next-best action.

2. Data Management

The Data Management section focusses on the data reservoir on Hadoop in combination with the Data Warehouse on Relational Databases. The factory is between both to manage and orchestrate data floating between these.

3. Data Lab

Discovering new data sets before they are processed in the full Data Warehouse environment. Focus in this area is on rapid data provisioning.

4. Business Analytics

The analytical tools that consume data from the Data Management layer. This is not only focused on extracting data from the Data Warehouse, but also from the Data Reservoir.

5. Apps

A collection of prebuilt adapters and Application Programming Interfaces (API’s) that enable all data sources and processing to be directly integrated into custom or packaged business applications.

The following subjects were discussed during this day about Big Data Management:

  1. Oracle Big Data & Data Warehousing Strategy
  2. Data Reservoir Best Practices
  3. Enterprise Information Store best practices
  4. Data Integration & Data Quality
  5. Oracle R and Oracle Advanced Analytics
  6. Oracle Spatial & Graph
  7. Internet of Things

This blog will share some highlights of each session.

Oracle Big Data & Data Warehousing Strategy

This session by John Abrahams was the opening session of the day, discussing the strategy of Oracle on Big data and Data Warehousing. – Highlights of Oracle Open World 2016 were shared as well.
One of the interesting subjects touched upon was the option to purchase Cloud Machines. These machines are the same as the machines Oracle is using in the public cloud, but then located in the data center of the customer. Main advantage is that data is still in your own data center, but the machine is being monitored by Oracle.

Main advantage is that data is still in your own data center, but the machine is being monitored by Oracle.

Another interesting cloud offering is the Oracle Database Exadata Express Cloud service. This is a managed service running Oracle Database 12c release 2 on the Oracle Exadata engineered system. This database has all options enabled, but has a limited size of 50 GB. Furthermore, the following interesting new features of Oracle Database 12c Release 2 were discussed:

  1. Optimizations on the Oracle In-Memory database feature
    a. Fast start by storing data persistent.
  2. Enhancements on Oracle Partitioning
  3. Introduction of Analytical views

As this database version is not yet officially released, more information will be made available later.

Interesting functions discussed during this session which were new to me:

1. APPROX_COUNT_DISTINCT
With this function it is possible to get the approximate number of rows that contain unique column values. This function is faster than the count(distinct) option itself.

2. Row limiting clause
This clause makes it easier to return a portion of the data without writing analytical functions. Interesting and easy to use!

Data Reservoir Best Practices

John continued in discussing the best practices of Data Reservoir in combination with the Data Warehouse. Strategies about Information Lifecycle Management were discussed. Data which is on your Data Warehouse but no longer hot data, can easily be offloaded towards your Data Reservoir. This can be achieved by using Oracle Copy to Hadoop, extracting the data from the database in DMP format and storing it on the HDFS file system.

At the end of this session, Wim Villano showed the Oracle Big Data Discovery application. I already worked with this tool and find it very easy to use on top of data residing in HDFS. The version Wim showed, already had some enhancements compared to the one that I used and still looks very promising for exploring your data very easily.

Data Integration & Data Quality

Marti Koppelmans explained the benefits of using Oracle Data Integrator ODI for extracting the data from the Data Reservoir onto the Data Warehouse. The flexibility of ODI is huge via the knowledge modules and makes it easy to generate Hive, Pig or Flume code.

Oracle R and Oracle Advanced Analytics

After the Data Integration part, Marti focused on explaining more on the Analytics part including Oracle R. Marti explained that the strategy of Oracle has always been: process near the data, don’t extract the data and process somewhere else. This is also the thing Oracle did with R: install it in the database to make it scalable and guarantee the performance.
If you want to know more on this topic, please read the book Oracle R Enterprise by Brendan Tierney.

Oracle Spatial & Graph

In this session, Shintaro Nagaoka explained the rebranding of Oracle Spatial towards Oracle Spatial & Graph. Also, the different ways this is implemented within the Oracle Database and in the Oracle Big Data Solution. The Oracle Spatial technology is something which is already there for more than 20 years and makes it possible to analyze data on a geographical way.
One of the interesting parts mentioned in this session, is Oracle Multimedia Analytics. – Something I wasn’t aware of at the time. This product is a framework that enables video and image processing to run within the Hadoop environment; it enabled you to detect faces in video’s and images.
If you would like to know more on this, download the Oracle Big Data Lite Virtual Machine from this website. More information can be found on the following blog.

Internet of Things

Eugene Bogaart explained the use of the Oracle Internet of Things Cloud Service. He explained this service with his own Coffee Machine, providing data towards the IoT service. The demo itself was really easy to understand and well explained. It became also clear that if you can have a sensor in a machine, it was easy connecting it towards the cloud service to consume the data. Integration of this data towards the Oracle BI Cloud Service was also straightforward, giving you a lot of opportunities to work with IoT data in the Oracle cloud.

This session about Oracle Big Data Management was really interesting to attend and gave me a great overview of the Oracle Big Data Management options. Oracle has many solutions available right now to help you manage your environment, either in the cloud or on premises. If you would like to know more, feel free to contact us and we would be happy to help!

devoteam