How to know who is who in Big Data?

How to know who is who in Big Data?

August 19, 2020 By carolc

Big Data is not something new, the term was born in the 90s, although we must recognize that its popularity explosion happened in the 2010s. It is closely related to business intelligence and we know that large corporations cannot live without it, but we have also been told that small and medium-sized companies shouldn’t live without it either. And then we – who did well in databases at college, and who like to sort, classify and discover what is happening, thanks to data, and who always knew that information is power – see an opportunity here, and feel compelled to enter into this world, because this is what we want to do. And there is when questions arise, who is who in Big Data? what we exactly want to do? What’s a Data Engineer? And a Data Scientist? What does a Business Analyst do? And the doubts just grow and grow.

There is no need to panic. We got your back, we are going to break down for you who is who in the Big Data world. But before, some disclaimers, this is a really wide world, and is still growing, so you can find so many different job roles and job descriptions out there, and because it is still growing some companies call it a role in one way and some call it another way. On top of that, bigger companies have more specific roles to more specific jobs, so to keep it simple we’re gonna focus on main areas of expertise closely related with Big Data activities, so jobs like backend developers (who definitely could bring value in the Big Data tasks) will not be described. That said, let’s go.

Business Analyst

A Business Analyst (BA) is in charge of knowing the business from a Big Data perspective, and this is mainly about metrics. In simple terms, metrics are values that can be calculated using the data that we have available and gives us an idea of whether things are going well in our business or not.

The BA is in charge of identifying or even defining these metrics, taking into account the particularities of our business, selecting the ones that best fit for us, and looking for appropriate models for each metric. To do this, they must have the skills to interview different members of the organization to extract the necessary knowledge for metrics and models definition.

But metrics and models are nothing without proper evaluation, in the scientific world everything has to be quantifiable and verifiable, so the BA must have a way to verify that applied models are effectively providing valuable knowledge.

Finally, the BA must be able to transmit the knowledge obtained from data analysis, make data visualization presentations for the staff members who require it and transmit the current state of the business, the path it is taking and the focal points that must be attended to optimize the processes, all this in a way management and other stakeholder could easily understand in order to take the right actions.

Data Engineer

We already have the metrics that matter to us, and methods to obtain them, now we only need the data. No problem! We do some backups of different corporate databases and get to work, right? Wrong! Nothing further from reality.

Data is messy. And analysis tools require some consistency in the data in order to function properly, and working with multiple databases from different systems only makes it worse, data that represents the same thing but is stored in a different format in each system, essential data in one system which is irrelevant in another, and so on. A normal day for a DE can be something like this; DE: what if the data does not exist?, DBA1: here we put null, DBA2: here we put a point, because the system does not handle nulls, DBA3: what does null mean?

If working with multiple databases makes it more complicated, then let’s not complicate ourselves, let’s just use the sales database, right? Wrong again! Very wrong. Remember, this is BIG Data, the more data we handle the higher the quality of the generated  knowledge will be. So, who saves us from all this? Well, the Data Engineers do, they are in charge of extracting the necessary information from all databases involved (sometimes even from that Excel file Jerry updates every morning), clean and homogenize the data (seriously guys, tell me what null means?) And locating them in a special structure for data analysis, because the transactional databases commonly used by systems are not the ideal environment for these analysis tasks.

Data Analyst

After data is correctly defined, ambiguities and inconsistencies have been eliminated and it has been stored in the appropriate structures for analysis, Data Analyst comes along, they are in charge of executing analysis tools on data and interpreting obtained results. They are the SQL masters and they have the possibility of recombining data to improve analyzes, and next to generating these analyzes, they have the immense responsibility of interpreting results and extracting valuable knowledge from those. They must know one or two things about data visualization, identifying patterns, correlations and other cool stuff. They alongside BA are in charge of transforming numbers, peaks, graphs and correlations into things like “We must offer our clients between 30 and 38 year old who are married a 15% discount on white wine on Thursday nights.”

Data Scientist

Data Scientist (DS) as you can imagine is the big leagues guys. They are the one who actually builds data analysis models using tools such as Matlab, R or SAS. They have a solid statistics and data mining techniques understanding. They are clearly the “scientist” of the bunch, Bayesian Learning, Probabilistic Models, Machine Learning, are routine things for them. Of course, they also have some knowledge of programming languages such as Python, Java or Javascript. As we mentioned, they are in charge of designing and building the models that DA will apply to data, but they not only work with business oriented models (increase sales, reduce costs, improve production chains) they also do nicer things such as Natural languages processing for customer service or sentiment analysis on social networks to see how our community perceives our brand. Cool, huh?

And this is it! Now you know who is who in Big Data, but remember, these profiles overlap each other at the borders, a DS, a DA and a DE must have knowledge of how to create a database schema, generate queries by combining data or complete data in empty fields but these tasks that we have mentioned are more specifically executed respectively by each of the mentioned profiles. However, here we have tried to present most representative tasks directly related to each one of them.

Now you are ready to decide who you want to be in the Big Data world!

 

How to master developing
How to master developing
Once we know and understand programming basic fundamentals, we have the basis to master developing. In this article, we will be learning how to do it. Before continue, maybe we can take a look at How to learn to code:…

How to share a Netflix account?
How to share a Netflix account?
Streaming platforms have positioned themselves as one of the best entertainment alternatives. Several companies offer this service, and without a doubt one of the most popular is Netflix. In addition to the wide menu of films, series, documentaries, original films,…

How to learn to code: programming basic fundamentals
How to learn to code: programming basic fundamentals
Most people think programming it is just decide which is the best programming language and begin to code, then jump right into syntax and creating apps. Although it is true that with newest tools and frameworks it is easier to…

How does Netflix party work?
How does Netflix party work?
The streaming giant, Netflix, has become an important part of the lives of millions of people worldwide. It is an entertainment medium with a wide menu of options that includes series, films, documentaries, miniseries and animated productions. There is a…

How to use the Internet of Things and why?
How to use the Internet of Things and why?
What is known today as the Internet of Things (IoT), refers to the connection of computer devices included in common objects, through the Internet. This connection allows objects to receive and send data to carry out functions such as remote…

What is Computer-Generated Art and how it works?
What is Computer-Generated Art and how it works?
Computer-generated art, also known as algorithmic art, it refers to any piece of image, video, music or tangible material with an aesthetic or artistic purpose created entirely by a computer with some human control. It is a digital art method…