Thanks to the amazing success of AI, we’ve seen more and more organizations implement Machine Learning into their pipelines. As the access to and collection of data increases, we have seen massive datasets being used to train giant deep learning models that reach superhuman performances. This has led to a lot of hype around domains like Data Science and Big Data, fueled even more by the recent boom in Large Language Models.
Big Tech companies (and Deep Learning Experts on Twitter/YouTube) have really fallen in love with the ‘add more data, increase model size, train for months’ approach that has become the status-quo in Machine Learning these days. However, heretics from Meta AI published research that was funded by Satan- and it turns out this way of doing things is extremely inefficient. And completely unnecessary. In this post, I will be going over their paper- Beyond neural scaling laws: beating power law scaling via data pruning, where they share ‘evidence’ about how selecting samples intelligently can increase your model performance, without ballooning your costs out of control. While this paper focuses on Computer Vision- the principles of their research will be interesting to you regardless of your specialization. Read More
Tag Archives: Data Science
State of Data Science 2022: Paving the Way for Innovation
Anaconda’s 2022 State of Data Science report is here! As with years prior, we conducted a survey to gather demographic information about our community, ascertain how that community works, and collect insights into big questions and trends that are top of mind within the community. As the impacts of COVID continue to linger and assimilate into our new normal, we decided to move away from covering COVID themes in our report and instead focus on more actionable issues within the data science, machine learning (ML), and artificial intelligence industries, like open-source security, the talent dilemma, ethics and bias, and more. Read More
Read the Report
Why it’s time for “data-centric artificial intelligence”
Machine learning pioneer Andrew Ng argues that focusing on the quality of data fueling AI systems will help unlock its full power.
The last 10 years have brought tremendous growth in artificial intelligence. Consumer internet companies have gathered vast amounts of data, which has been used to train powerful machine learning programs. Machine learning algorithms are widely available for many commercial applications, and some are open source.
Now it’s time to focus on the data that fuels these systems, according to AI pioneer Andrew Ng, SM ’98, the founder of the Google Brain research lab, co-founder of Coursera, and former chief scientist at Baidu.
Ng advocates for “data-centric AI,” which he describes as “the discipline of systematically engineering the data needed to build a successful AI system.” Read More
How to transition into a career in ML/AI
How to build a Data Analytics Pipeline on Google Cloud?
How To Become A Full Stack Data Scientist In 2022
2022 is here and Data Science still remains the sexiest and among the highest paying jobs.
In 2021 and years before that, Data Science saw a quick spike in growth, especially during the peak of the Covid 19 Pandemic, and many industries have jumped on the power of Data Science to draw the most value to their products.
Many industries hired more people with Data Science and Analytical skills more than any other in any department.
Not only did companies chased Data Scientist but many people also jumped on the trend of becoming a Data Scientist. Some changed their profession entirely from one domain to Data Science domain like one of my students, Evelyn who was a Marketing Manager(salary: $62,710) and now a Data Scientist(salary: $123,444).
People often ask me: is Data Science going to continue to be attractive in 2022 and the up coming years?
The answer is YES!! Read More
3 steps for creating a data-to-value ecosystem
The key to managing a mountain of data and disruptive technologies may lie in establishing a center of competency.
Although many organizations are using artificial intelligence (AI) and machine language (ML) tools as core enablers in their data analytics projects, and AI spending worldwide continues to rise, the hard truth is that most data science projects are doomed to fail.
There are several reasons for these failures, ranging from the inherent complexity of AI/ML initiatives and the persistent lack of skilled talent to challenges that exist in data security, governance, and data integration. These issues are collectively referred to as concerns for” data readiness,” according to an IDC global survey of more than 2,000 IT and line-of-business decision-makers, all of whom are involved in some level of AI use or development. Read More
Never invest your time in learning complex things.
The data scientist hype train has come to a grinding halt . It has been a joy ride for me for I was one of the people who got hooked into data science as soon as it came out. Math, engineering and the ability to predict stuff was very attractive indeed for a self-professed geek . I couldn’t resist and soon I was devouring one book after the other. I started with Springer Publications (Max Kuhn) , Tevor Hastie, a lot of Orielly books and followed it up with Statistics and Math courses until I had the math and the techniques (Linear/Logistic Regression, SVM,Random Forests, Decision Trees and few 20 others) down pat. Sounds great right, not quite.
Then came the Deep Learning revolution. I was first exposed to it thanks to Jeremy Howard who in my opinion still runs the best damn Deep learning course on the internet. He explains vision, NLP and even structured data machine learning. The guy is literally able to translate gobbledygook for the masses ( Me :-)) Plug: https://www.fast.ai/ . Read More
How to Become Data Scientist – A Complete Roadmap
Markov models and Markov chains explained in real life: probabilistic workout routine
Markov defined a way to represent real-world stochastic systems and processes that encode dependencies and reach a steady-state over time.
Andrei Markov didn’t agree with Pavel Nebrasov, when he said independence between variables was necessary for the Weak Law of Large Numbers to be applied.
The Weak Law of Large Numbers states something like this:
When you collect independent samples, as the number of samples gets bigger, the mean of those samples converges to the true mean of the population.
But Markov believed independence was not a necessary condition for the mean to converge. So he set out to define how the average of the outcomes from a process involving dependent random variables could converge over time. Read More
