In this episode of Leading with Data, we explore the fascinating world of data science with Rohan Rao, a quadruple Kaggle Grandmaster and machine learning solutions expert. Rohan shares insights into strategic partnerships, the evolution of data tools, and the future of large-scale language models, as he sheds light on the challenges and innovations shaping the industry.
You can listen to this episode of Leading with Data on popular platforms like Spotify, Google Podcasts, and Apple. Pick the one that interests you most and enjoy the insightful content!
Key insights from my conversation with Rohan Rao
- Strategic partnerships in competition can lead to memorable wins and learning experiences.
- Advances in data science tools require continuous learning and adaptation by practitioners.
- The future of LLM may depend on new data sources and synthetic data generation.
- Companies want to integrate LLMs, but face challenges in applying them to their own data sets.
- A comprehensive framework for choosing an LLM can help businesses make informed decisions.
- Experimentation is key when choosing between traditional algorithms and generative AI to solve business problems.
- Proprietary LLMs with APIs often offer businesses a more convenient solution, even if they cost more.
- Responsible AI requires a multifaceted approach, including technology, policy, and regulation.
- Specialized AI agents are expected to help businesses set goals and solve problems efficiently.
Join us for our Leading with Data session where you’ll have insightful discussions with leaders in AI and data science!
Let’s take a closer look at our conversation with Rohan Lao!
How did you get into data science, and which competition is most memorable to you?
Kunal, thank you for inviting me to Leading With Data. My journey in data science started almost a decade ago and has been full of coding, hackathons, and competitions. It’s hard to pick a standout competition, but one of my most memorable moments was when I cleverly teamed up with a strong competitor to win three times in a row at the Analytics Vidhya hackathon. It was a strategic move, it paid off, and it’s a fond memory of my competitive days.
Looking at the trends, how has data science developed recently?
The field of data science has gone through stages of incremental progress and sudden leaps. Tools like XGBoost have revolutionized predictive modeling, BERT has transformed NLP, and ChatGPT was recently released, a major milestone that demonstrates the capabilities of LLM. These advancements have forced data scientists to continuously adapt and upgrade their skills.
What are your predictions for the future of generative AI?
The trajectory of LLM tends to show initial rapid improvement and then plateau. It becomes more difficult to improve performance incrementally over time. LLM has learned from vast amounts of Internet data, but future improvements may depend on new large-scale data sets or innovations in synthetic data generation. The computational resources available today are unprecedented, making innovation more accessible than ever.
How are companies adopting generative AI and LLM?
Companies across a wide range of industries are eager to integrate LLM into their operations. The challenge lies in combining these models with proprietary business data, which is often not as extensive as the data on which LLM is trained. At H2O.ai, we see a significant portion of our work focused on enabling companies to leverage the power of LLM with their own data sets.
What are the most common use cases you’ve seen across industries?
The most common question from businesses is how to get LLM to learn from specific data. The goal is to apply LLM’s general competencies to solve unique business challenges. This includes understanding the strengths and limitations of the model and integrating it with existing systems and data formats.
Can you share a framework for choosing the right LLM to suit your business needs?
Absolutely. The framework I presented at Data Hack Summit includes 12 things to consider when choosing an LLM that is right for your business. These range from the functionality and accuracy of the model to scalability, cost, and legal considerations such as compliance and privacy. It is important to evaluate these factors to determine the LLM that best fits your business goals and constraints.
Should you choose traditional algorithms or generative AI?
The key is to experiment and iterate. While traditional algorithms like XGBoost have been the answer for many problems, LLM offers new possibilities. By comparing performance on specific tasks, companies can determine which approach delivers better results and is more feasible to deploy and manage.
What should you consider when building an engineering solution centered around an LLM?
Choosing between a proprietary LLM with APIs and hosting an open source LLM on-premises is an important decision. The open source model can be cost-effective, but it has hidden complexities such as infrastructure management and scalability. Often, companies prefer API services because of their convenience, despite the higher cost.
How do you tackle the challenges of responsible AI?
Responsible AI is a complex problem that goes beyond technological solutions. While there are safeguards and frameworks to prevent misuse, the unpredictable nature of LLM makes it difficult to fully control. Solutions may include a combination of technological safeguards, government policies, and AI regulations to balance innovation and ethical use.
What do you think about leveraging AI agents in business?
I am very optimistic about the potential of AI agents. Expert agents can perform specific tasks with high accuracy, but it is difficult to integrate these micro-tasks into a broader solution. Some products can wrap existing LLMs with custom prompts, but truly specialized agents have the potential to revolutionize the way we approach problem solving in a variety of domains.
End Notes
As Rohan emphasizes, navigating the landscape of data science and generative AI requires continuous learning and experimentation. By embracing innovative frameworks and responsible AI practices, companies can harness the power of data to drive meaningful solutions and ultimately transform how they operate and compete in rapidly evolving markets.
For more exciting sessions on AI, data science and GenAI, stay tuned with us at Leading with Data.
Check out our upcoming sessions here.