No. But it will change how data analysts are employed.
Since the November 2022 release of ChatGPT, there has been growing discussion on whether or not generative AI would eventually replace the position of a data analyst (ChatGPT, Bard, and Bing Chat are some of the major language models included in this classification). The fact that these large language models (LLMs) are able to write code serves as the basis for a lot of this conjecture.
Understanding the effects of generative AI on our area has undoubtedly peaked my attention as someone who has spent most of my professional career in the data analysis industry. I have since spent a good amount of time evaluating the existing capabilities of generative AI in the context of data analysis after giving in to curiosity.
I’ll give you a summary of my research and share my conclusions with you in this article because I think generative AI will play a big part in future data analysis projects. Furthermore, I think it is crucial for the community of data analysts to comprehend the tremendous effects it will have on both their industry and the corporate environment as a whole.
Where We Are Right Now
We currently understand that generative AI is capable of writing SQL, Python, and R code. We may also anticipate that as they continue to refine their work, the efficiency of the code they write will only increase. But that’s only the beginning.
The Code Interpreter plugin for OpenAI’s ChatGPT was made available at the end of March (2023). You may upload data files into the Alpha version and use Python to perform regression analysis and descriptive analysis, search for trends in your data, and even build visualizations if you are one of the few people who presently have access to it. All of this without needing to write or even comprehend a single line of Python code! Ethan Mollick, a renowned professor at the Wharton School of Business, has a wonderful article on this.
There you have it, then. without creating a single line of code, it is possible to load, analyze, and present data. Game over, right? Wait a minute.
Despite how amazing these abilities are, Code Interpreter has several important limits that highlight some of the difficulties generative AI would face in taking over the data analysis business.
First, ONE table needs to be uploaded. One two-dimensional CSV file, with a 100 MB maximum size restriction. Putting the size restriction aside, the picture has to create a single table containing all the data for your firm.
I suppose I could stop there, but let’s continue.
Once you have your one table, you must now obtain permission to push it outside of your company’s firewall and into an LLM that they do not control. This one table contains ALL of your company’s data.
I think we can stop there.
The current alternative to the aforementioned would be for your business to develop its own LLM (more on this later). Although theoretically feasible, only a very small number of businesses would find it to be cost-effective due to the difficulty of training and fine-tuning the model, the knowledge necessary, and the massive costs associated with doing so.
But let’s take a step back for the sake of comprehension and consider that your organization is on that list.
But let’s start with some context first. When business intelligence tools were first introduced in the early 2000s, their greatest value lay in their capacity to allow non-technical, line-of-business personnel to capitalize on their domain expertise by enabling them to select, analyze, and present data without writing a single line of code. Sounds recognizable?
Providing simple tools for data analysis is nothing new. It will always be really valuable. It is a multi-billion dollar industry that is still expanding, in fact. However, without domain expertise, these technologies are useless. This holds true independent of the tool(s) being utilized for any data analysis. even if the AI is creative. Without subject expertise, we are unable to formulate meaningful queries for our data. And even if the answers to the questions were given to us, how should we interpret the results?
And in my opinion, the ability to respond to ad hoc inquiries is where data analysis work adds the most value. mission-critical questions that were unexpected. Nonlinear, multi-layered, and complex question types. Domain expertise is needed in order to respond to these queries.
For instance, why did sales of our top-selling product suddenly plummet? What do we do now that our main supplier has recently gone out of business? Why did last month’s customer churn rate double? These are not simple problems that can be answered by following a pre-established decision tree.
These few instances all have one thing in common: they call for prompt responses to novel situational questions. That is the crucial factor. If you comprehend how generative AI works, you will realize that this is its Achilles heel in terms of ever being able to completely replace data analysts.
To put it quickly, generative AI makes use of existing data sets to ‘train’ an LLM to produce a probability-driven response based on whatever training data it has been fed. Furthermore, even while you can continuously improve your model with ever-more exact data sets, how would you train your model on complex, hypothetical scenario questions?
It would be comparable to starting a new work as a data analyst in a sector with which you are not yet familiar. And on the first day, you are required to immediately respond to one of the questions above. How would one even begin? Which data would you retrieve? How would you even be aware of all the potential factors you would need to take into account? And even if you were able to come up with a solution, how would you know whether it was accurate?
For these reasons, I don’t think generative AI will ever completely take the place of data analysts. However… In terms of data analysis, generative AI already has a wide range of applications, and as its capability advances, those applications will only grow.
Uses of Generative AI in Data Analysts Right Now
The ability of generative AI to build code and then explain that code (which it does fairly well) is now its highest and best used in the field of data analysis. Personally, I’ve used it to make writing and comprehending Python code easier.
I can’t stress enough how important it is for those of you wishing to work in data analysis to use generative AI to aid in your coding education. It would have significantly accelerated my learning process when I was just starting out in this industry.
Generative AI has driven the creation of specialized coding tools, which is another fascinating development for data analysts. Copilot, a new application from GitHub, can instantly suggest ways to enhance your code as you write it.
I mentioned the various difficulties businesses can encounter while developing their own LLMs earlier in this essay. That said, Databricks just launched an open-source LLM dubbed “Dolly” as a potential replacement substitute. The expense (being open source) and having to push your data outside of your company’s firewall difficulties might theoretically be resolved by this. It’s a more compact LLM that works well with narrow datasets.
I bring up Dolly largely to show how swiftly innovations in the field of generative AI are progressing and to warn readers about potential future effects on the data analysis profession.
As we’ve previously witnessed, AI will only continue to advance at the speed of light.
Conclusion
I have no doubt that generative AI will change the way data analytic activities are done. In general, generative AI will eventually execute repetitive jobs and even analysis. Additionally, I could see coding evolving from a highly specialized ability to more of a commodity.
On the basis of the aforementioned, I think that the future archetypal data analyst will have business line-level subject expertise together with the capacity to use generative AI tools to help them make better use of their time.
On a personal note, I’d like to conclude by urging everyone who reads this to embrace generative AI. Discover it and put it to use in both your personal and professional lives. Its scope and powers will only expand as new APIs and plugins are continuously developed whether for good or bad.
Read more articles related to ChatGpt:
Youtube Videos Were Secretly Used By OpenAI To Train ChatGPT?
Awesome!!!!