Skip to content

Anonymizing a column in Spotfire

Anonymizing data, nowadays is a “must’ in every organization. Having trouble with anonymizing a column in Spotfire? Then read this step-by-step manual. Consider the following situation; you have one table coming from a database (it doesn’t matter what type). This table contains production information about people.

To make it simple:

Name Date Production
Eric 2-2-2018 100
Marie 2-2-2018 90
Eric 2-3-2018 95
John 2-3-2018 99
Eric 2-3-2018 110
   

The challenge

In Spotfire you want to report about the production but you cannot report about the people, so you want to categorize them into departments. These departments are not in the database yet, but in this case we have a spreadsheet:

Name Dept
Eric Dept A
Marie Dept B
John Dept A
Kim Dept B
 

The only proper way to join these tables is to put them into Spotfire and join on Name. However, the customer explicitly tells us that the Names should not be recognizable, as this might be a breach of privacy.

So, I started to look for a way to scramble the Name in both tables in such a way they can no longer be recognized but can still be used to join on. I decided to turn to Steven van der Kroft at TIBCO and we jointly came to a workable solution. Thanks, Steven, for your help!

How to anonymize data in Spotfire

So, what did we do? Steven created a small R script for me that did exactly what I was looking for:

df <- data.table(Tbl)

df$Name <- hash_names(df$Name,size = 25, full = FALSE)

(If you want to run this with live data on a web player then this does require a Statistical Server, obviously)

In the Input Parameters define Tbl as a Table, and in the Output Parameters define df as a Table. In this example, “Name” is hard coded, and is exactly the name of the column.

When adding a data table you can add this script as a Transformation. The result for the first table would look something like this:

Name Date Prod
02137e6f7e7a5de5f857f2717 2-2-2018 0:00:00 100
3b4e11cb98763528a107961e5 2-2-2018 0:00:00 90
02137e6f7e7a5de5f857f2717 2-3-2018 0:00:00 95
5b8c08b560f7be7eee35e63bf 2-3-2018 0:00:00 99
02137e6f7e7a5de5f857f2717 2-3-2018 0:00:00 110

 

And for the second:

Name Dept
02137e6f7e7a5de5f857f2717 Dept A
3b4e11cb98763528a107961e5 Dept B
5b8c08b560f7be7eee35e63bf Dept A
7d44a8b04600391d020ae6f81 Dept B

 

As you can see, the names cannot be recognized anymore, but we can still use them to join on! Just make sure you do not run the script twice…

Devoteam’s Spotfire knowledge

If you’re interested in setting up or improving your Spotfire you might be interested in the following blogs as well.

Blogs

Get in touch!

For more information about Tibco Spotfire, feel free to contact Eric Hans van Wingerden (eric.hans.van.wingerden@devoteam.com), Senior Spotfire Consultant at Devoteam Netherlands.

Or check out our Spotfire Knowledge below.

More about our TIBCO Spotfire Knowledge

Want to learn more about Spotfire through our client use cases, handy how-to articles and blog-posts? Click the button below to discover our knowledge.