How to know who is who in Big Data?

How to know who is who in Big Data?

By carolc

Big Data is not something new, the term was born in the 90s, although we must recognize that its popularity explosion happened in the 2010s. It is closely related to business intelligence and we know that large corporations cannot live without it, but we have also been told that small and medium-sized companies shouldn’t live without it either. And then we – who did well in databases at college, and who like to sort, classify and discover what is happening, thanks to data, and who always knew that information is power – see an opportunity here, and feel compelled to enter into this world, because this is what we want to do. And there is when questions arise, who is who in Big Data? what we exactly want to do? What’s a Data Engineer? And a Data Scientist? What does a Business Analyst do? And the doubts just grow and grow.

There is no need to panic. We got your back, we are going to break down for you who is who in the Big Data world. But before, some disclaimers, this is a really wide world, and is still growing, so you can find so many different job roles and job descriptions out there, and because it is still growing some companies call it a role in one way and some call it another way. On top of that, bigger companies have more specific roles to more specific jobs, so to keep it simple we’re gonna focus on main areas of expertise closely related with Big Data activities, so jobs like backend developers (who definitely could bring value in the Big Data tasks) will not be described. That said, let’s go.

Business Analyst

A Business Analyst (BA) is in charge of knowing the business from a Big Data perspective, and this is mainly about metrics. In simple terms, metrics are values that can be calculated using the data that we have available and gives us an idea of whether things are going well in our business or not.

The BA is in charge of identifying or even defining these metrics, taking into account the particularities of our business, selecting the ones that best fit for us, and looking for appropriate models for each metric. To do this, they must have the skills to interview different members of the organization to extract the necessary knowledge for metrics and models definition.

But metrics and models are nothing without proper evaluation, in the scientific world everything has to be quantifiable and verifiable, so the BA must have a way to verify that applied models are effectively providing valuable knowledge.

Finally, the BA must be able to transmit the knowledge obtained from data analysis, make data visualization presentations for the staff members who require it and transmit the current state of the business, the path it is taking and the focal points that must be attended to optimize the processes, all this in a way management and other stakeholder could easily understand in order to take the right actions.

Data Engineer

We already have the metrics that matter to us, and methods to obtain them, now we only need the data. No problem! We do some backups of different corporate databases and get to work, right? Wrong! Nothing further from reality.

Data is messy. And analysis tools require some consistency in the data in order to function properly, and working with multiple databases from different systems only makes it worse, data that represents the same thing but is stored in a different format in each system, essential data in one system which is irrelevant in another, and so on. A normal day for a DE can be something like this; DE: what if the data does not exist?, DBA1: here we put null, DBA2: here we put a point, because the system does not handle nulls, DBA3: what does null mean?

If working with multiple databases makes it more complicated, then let’s not complicate ourselves, let’s just use the sales database, right? Wrong again! Very wrong. Remember, this is BIG Data, the more data we handle the higher the quality of the generated  knowledge will be. So, who saves us from all this? Well, the Data Engineers do, they are in charge of extracting the necessary information from all databases involved (sometimes even from that Excel file Jerry updates every morning), clean and homogenize the data (seriously guys, tell me what null means?) And locating them in a special structure for data analysis, because the transactional databases commonly used by systems are not the ideal environment for these analysis tasks.

Data Analyst

After data is correctly defined, ambiguities and inconsistencies have been eliminated and it has been stored in the appropriate structures for analysis, Data Analyst comes along, they are in charge of executing analysis tools on data and interpreting obtained results. They are the SQL masters and they have the possibility of recombining data to improve analyzes, and next to generating these analyzes, they have the immense responsibility of interpreting results and extracting valuable knowledge from those. They must know one or two things about data visualization, identifying patterns, correlations and other cool stuff. They alongside BA are in charge of transforming numbers, peaks, graphs and correlations into things like “We must offer our clients between 30 and 38 year old who are married a 15% discount on white wine on Thursday nights.”

Data Scientist

Data Scientist (DS) as you can imagine is the big leagues guys. They are the one who actually builds data analysis models using tools such as Matlab, R or SAS. They have a solid statistics and data mining techniques understanding. They are clearly the “scientist” of the bunch, Bayesian Learning, Probabilistic Models, Machine Learning, are routine things for them. Of course, they also have some knowledge of programming languages such as Python, Java or Javascript. As we mentioned, they are in charge of designing and building the models that DA will apply to data, but they not only work with business oriented models (increase sales, reduce costs, improve production chains) they also do nicer things such as Natural languages processing for customer service or sentiment analysis on social networks to see how our community perceives our brand. Cool, huh?

And this is it! Now you know who is who in Big Data, but remember, these profiles overlap each other at the borders, a DS, a DA and a DE must have knowledge of how to create a database schema, generate queries by combining data or complete data in empty fields but these tasks that we have mentioned are more specifically executed respectively by each of the mentioned profiles. However, here we have tried to present most representative tasks directly related to each one of them.

Now you are ready to decide who you want to be in the Big Data world!


%d bloggers like this: