Anonymizing data, nowadays is a “must’ in every organization. Having trouble with anonymizing a column in Spotfire? Then read this step-by-step manual. Consider the following situation; you have one table coming from a database (it doesn’t matter what type). This table contains production information about people.
To make it simple:
Name | Date | Production |
Eric | 2-2-2018 | 100 |
Marie | 2-2-2018 | 90 |
Eric | 2-3-2018 | 95 |
John | 2-3-2018 | 99 |
Eric | 2-3-2018 | 110 |
… |
The challenge
In Spotfire you want to report about the production but you cannot report about the people, so you want to categorize them into departments. These departments are not in the database yet, but in this case we have a spreadsheet:
Name | Dept |
Eric | Dept A |
Marie | Dept B |
John | Dept A |
Kim | Dept B |
… |
The only proper way to join these tables is to put them into Spotfire and join on Name. However, the customer explicitly tells us that the Names should not be recognizable, as this might be a breach of privacy.
So, I started to look for a way to scramble the Name in both tables in such a way they can no longer be recognized but can still be used to join on. I decided to turn to Steven van der Kroft at TIBCO and we jointly came to a workable solution. Thanks, Steven, for your help!
How to anonymize data in Spotfire
So, what did we do? Steven created a small R script for me that did exactly what I was looking for:
df <- data.table(Tbl)
df$Name <- hash_names(df$Name,size = 25, full = FALSE)
(If you want to run this with live data on a web player then this does require a Statistical Server, obviously)
In the Input Parameters define Tbl as a Table, and in the Output Parameters define df as a Table. In this example, “Name” is hard coded, and is exactly the name of the column.
When adding a data table you can add this script as a Transformation. The result for the first table would look something like this:
Name | Date | Prod |
02137e6f7e7a5de5f857f2717 | 2-2-2018 0:00:00 | 100 |
3b4e11cb98763528a107961e5 | 2-2-2018 0:00:00 | 90 |
02137e6f7e7a5de5f857f2717 | 2-3-2018 0:00:00 | 95 |
5b8c08b560f7be7eee35e63bf | 2-3-2018 0:00:00 | 99 |
02137e6f7e7a5de5f857f2717 | 2-3-2018 0:00:00 | 110 |
And for the second:
Name | Dept |
02137e6f7e7a5de5f857f2717 | Dept A |
3b4e11cb98763528a107961e5 | Dept B |
5b8c08b560f7be7eee35e63bf | Dept A |
7d44a8b04600391d020ae6f81 | Dept B |
As you can see, the names cannot be recognized anymore, but we can still use them to join on! Just make sure you do not run the script twice…
Devoteam’s Spotfire knowledge
If you’re interested in setting up or improving your Spotfire you might be interested in the following blogs as well.
Blogs
- Blog: Spotfire 7.11 is here!
- Blog: How to use properties in your Spotfire visualizations
- Blog: How to make a Spotfire Accordion in 3 easy steps!
Get in touch!
For more information about Tibco Spotfire, feel free to contact Eric Hans van Wingerden (eric.hans.van.wingerden@devoteam.com), Senior Spotfire Consultant at Devoteam Netherlands.
Or check out our Spotfire Knowledge below.
More about our TIBCO Spotfire Knowledge
Want to learn more about Spotfire through our client use cases, handy how-to articles and blog-posts? Click the button below to discover our knowledge.