State of Data Science 2022: Paving the Way for Innovation

Anaconda’s 2022 State of Data Science report is here! As with years prior, we conducted a survey to gather demographic information about our community, ascertain how that community works, and collect insights into big questions and trends that are top of mind within the community. As the impacts of COVID continue to linger and assimilate into our new normal, we decided to move away from covering COVID themes in our report and instead focus on more actionable issues within the data science, machine learning (ML), and artificial intelligence industries, like open-source security, the talent dilemma, ethics and bias, and more. Read More

Read the Report

#data-science

Why it’s time for “data-centric artificial intelligence”

Machine learning pioneer Andrew Ng argues that focusing on the quality of data fueling AI systems will help unlock its full power.

The last 10 years have brought tremendous growth in artificial intelligence. Consumer internet companies have gathered vast amounts of data, which has been used to train powerful machine learning programs. Machine learning algorithms are widely available for many commercial applications, and some are open source.

Now it’s time to focus on the data that fuels these systems, according to AI pioneer Andrew Ng, SM ’98, the founder of the Google Brain research lab, co-founder of Coursera, and former chief scientist at Baidu.

Ng advocates for “data-centric AI,” which he describes as “the discipline of systematically engineering the data needed to build a successful AI system.” Read More

#data-science, #mlops

How to transition into a career in ML/AI

Read More

#data-science, #videos

How to build a Data Analytics Pipeline on Google Cloud?

Read More

#data-science, #videos

How To Become A Full Stack Data Scientist In 2022

2022 is here and Data Science still remains the sexiest and among the highest paying jobs.

In 2021 and years before that, Data Science saw a quick spike in growth, especially during the peak of the Covid 19 Pandemic, and many industries have jumped on the power of Data Science to draw the most value to their products.

Many industries hired more people with Data Science and Analytical skills more than any other in any department.

Not only did companies chased Data Scientist but many people also jumped on the trend of becoming a Data Scientist. Some changed their profession entirely from one domain to Data Science domain like one of my students, Evelyn who was a Marketing Manager(salary: $62,710) and now a Data Scientist(salary: $123,444).

People often ask me: is Data Science going to continue to be attractive in 2022 and the up coming years?

The answer is YES!! Read More

#data-science

3 steps for creating a data-to-value ecosystem

The key to managing a mountain of data and disruptive technologies may lie in establishing a center of competency.

Although many organizations are using artificial intelligence (AI) and machine language (ML) tools as core enablers in their data analytics projects, and AI spending worldwide continues to rise, the hard truth is that most data science projects are doomed to fail.

There are several reasons for these failures, ranging from the inherent complexity of AI/ML initiatives and the persistent lack of skilled talent to challenges that exist in data security, governance, and data integration. These issues are collectively referred to as concerns for” data readiness,” according to an IDC global survey of more than 2,000 IT and line-of-business decision-makers, all of whom are involved in some level of AI use or development. Read More

#data-science

Never invest your time in learning complex things.

The data scientist hype train has come to a grinding halt . It has been a joy ride for me for I was one of the people who got hooked into data science as soon as it came out. Math, engineering and the ability to predict stuff was very attractive indeed for a self-professed geek . I couldn’t resist and soon I was devouring one book after the other. I started with Springer Publications (Max Kuhn) , Tevor Hastie, a lot of Orielly books and followed it up with Statistics and Math courses until I had the math and the techniques (Linear/Logistic Regression, SVM,Random Forests, Decision Trees and few 20 others) down pat. Sounds great right, not quite.

Then came the Deep Learning revolution. I was first exposed to it thanks to Jeremy Howard who in my opinion still runs the best damn Deep learning course on the internet. He explains vision, NLP and even structured data machine learning. The guy is literally able to translate gobbledygook for the masses ( Me :-)) Plug: https://www.fast.ai/ . Read More

#data-science, #training

How to Become Data Scientist – A Complete Roadmap

#data-science

Markov models and Markov chains explained in real life: probabilistic workout routine

Markov defined a way to represent real-world stochastic systems and processes that encode dependencies and reach a steady-state over time.

Andrei Markov didn’t agree with Pavel Nebrasov, when he said independence between variables was necessary for the Weak Law of Large Numbers to be applied.

The Weak Law of Large Numbers states something like this:

When you collect independent samples, as the number of samples gets bigger, the mean of those samples converges to the true mean of the population.

But Markov believed independence was not a necessary condition for the mean to converge. So he set out to define how the average of the outcomes from a process involving dependent random variables could converge over time. Read More

#data-science

The Role of Surrogate Models in the Development of Digital Twins of Dynamic Systems

Digital twin technology has significant promise, relevance and potential of widespread applicability in various industrial sectors such as aerospace, infrastructure and automotive. However, the adoption of this technology has been slower due to the lack of clarity for specific applications. A discrete damped dynamic system is used in this paper to explore the concept of a digital twin. As digital twins are also expected to exploit data and computational methods, there is a compelling case for the use of surrogate models in this context. Motivated by this synergy, we have explored the possibility of using surrogate models within the digital twin technology. In particular, the use of Gaussian process (GP) emulator within the digital twin technology is explored. GP has the inherent capability of addressing noisy and sparse data and hence, makes a compelling case to be used within the digital twin framework. Cases involving stiffness variation and mass variation are considered, individually and jointly along with different levels of noise and sparsity in data. Our numerical simulation results clearly demonstrate that surrogate models such as GP emulators have the potential to be an effective tool for the development of digital twins. Aspects related to data quality and sampling rate are analysed. Key concepts introduced in this paper are summarised and ideas for urgent future research needs are proposed. Read More

#data-science, #robotics