Why Do Companies Focus on Data Structures and Algorithms in Tech Interviews?

Data Structures and Algorithms (DSA), is a skill you must learn if you want to work as a programmer/developer or data scientist, particularly in large tech giants. Although it may not directly relate to coding, having a solid understanding of DSA helps the software development process run well. It assists a programmer in adopting a reasoned strategy for understanding and resolving a problem.

Most businesses use DSA to evaluate a candidate’s skills. The importance of DSA for your coding career is discussed in this blog, along with tips on how to get ready for interviews. Read More

#data-science, #devops

An Interview With the Guy Who Has All Your Data

It’s 10 pm. Do you know where your data is? Chad Engelgau does. He’s the CEO of Acxiom, a data broker. Your info is probably on one of his servers.

Chad Engelgau is the CEO of Acxiom, a data broker that operates one of the world’s biggest repositories of consumer information. The company claims to have granular details on more than 2.5 billion people across 62 different countries. The chances that Acxiom knows a whole lot about you, reader, are good

In many respects, data brokering is a shadowy enterprise. The industry mostly operates in quiet business deals the public never hears about, especially smaller firms that engage with data on particularly sensitive subjects. Compared to other parts of the tech industry, data brokers face little scrutiny from regulators, and in large part they evade attention from the media. Read More

#data-science

Unstructured Data Challenges for 2023 and their Solutions

Unstructured data is information that does not have a pre-defined structure. It’s one of the three core data types, along with structured and semi-structured formats.

Examples of unstructured data include call logs, chat transcripts, contracts, and sensor data, as these datasets are not arranged according to a preset data model. Unstructured data must be standardized and structured into columns and rows to make it machine-readable, i.e., ready for analysis and interpretation. This makes managing unstructured data difficult. Read More

#data-science

Why big data is not a priority anymore, and other key AI trends to watch

Artificial Intelligence models that generate entirely new content are creating a world of opportunities for entrepreneurs. And engineers are learning to do more with less.

Those were some takeaways from a panel discussion at the Intelligent Applications Summit hosted by Madrona Venture Group in Seattle this week.

“Big data is not a priority anymore, in my opinion,” said Stanford computer science professor Carlos Guestrin. “You can solve complex problems with little data.”

Engineers are more focused on fine tuning off-the-shelf models, said Guestrin, co-founder of Seattle machine learning startup Turi, which was acquired by Apple in 2016. New “foundation” AI models like DALL-E and GPT-3 can hallucinate images or text from initial prompts. Read More

#data-science, #strategy

The Evolution of The Data Engineer: A Look at The Past, Present & Future

There’s a buzz of excitement around data engineering right now, and for a good reason. Since its inception, there has been no slowdown in the data engineering field. New technologies and concepts are appearing particularly fast lately. As we near the end of 2022, it is a good moment to take a step back and evaluate the current state of data engineering.

What may the data engineer role of today look like in the future? Will it even exist?  In this blog post, I look at the past and the present of the data engineering role, examining emerging trends to offer you some predictions about the future. Read More

#data-science

Meta AIs shocking insight about Big Data and Deep Learning

Thanks to the amazing success of AI, we’ve seen more and more organizations implement Machine Learning into their pipelines. As the access to and collection of data increases, we have seen massive datasets being used to train giant deep learning models that reach superhuman performances. This has led to a lot of hype around domains like Data Science and Big Data, fueled even more by the recent boom in Large Language Models.

Big Tech companies (and Deep Learning Experts on Twitter/YouTube) have really fallen in love with the ‘add more data, increase model size, train for months’ approach that has become the status-quo in Machine Learning these days. However, heretics from Meta AI published research that was funded by Satan- and it turns out this way of doing things is extremely inefficient. And completely unnecessary. In this post, I will be going over their paper- Beyond neural scaling laws: beating power law scaling via data pruning, where they share ‘evidence’ about how selecting samples intelligently can increase your model performance, without ballooning your costs out of control. While this paper focuses on Computer Vision- the principles of their research will be interesting to you regardless of your specialization. Read More

#data-science, #deep-learning, #big7

State of Data Science 2022: Paving the Way for Innovation

Anaconda’s 2022 State of Data Science report is here! As with years prior, we conducted a survey to gather demographic information about our community, ascertain how that community works, and collect insights into big questions and trends that are top of mind within the community. As the impacts of COVID continue to linger and assimilate into our new normal, we decided to move away from covering COVID themes in our report and instead focus on more actionable issues within the data science, machine learning (ML), and artificial intelligence industries, like open-source security, the talent dilemma, ethics and bias, and more. Read More

Read the Report

#data-science

Why it’s time for “data-centric artificial intelligence”

Machine learning pioneer Andrew Ng argues that focusing on the quality of data fueling AI systems will help unlock its full power.

The last 10 years have brought tremendous growth in artificial intelligence. Consumer internet companies have gathered vast amounts of data, which has been used to train powerful machine learning programs. Machine learning algorithms are widely available for many commercial applications, and some are open source.

Now it’s time to focus on the data that fuels these systems, according to AI pioneer Andrew Ng, SM ’98, the founder of the Google Brain research lab, co-founder of Coursera, and former chief scientist at Baidu.

Ng advocates for “data-centric AI,” which he describes as “the discipline of systematically engineering the data needed to build a successful AI system.” Read More

#data-science, #mlops

How to transition into a career in ML/AI

Read More

#data-science, #videos

How to build a Data Analytics Pipeline on Google Cloud?

Read More

#data-science, #videos