Architectures Every Data Scientist And Big Data Engineer Should Know

Comprehensive and Comparative List of Feature Store Architectures for Data Scientists and Big Data Professionals.

Feature store has become an important unit of organizations developing predictive services across any industry domain.

… This blog post highlights the features supported by different Feature Store frameworks, that are primarily developed by different leading industry giants. Read More

#devops, #mlaas, #big7

How to get your data scientists and data engineers rowing in the same direction

In the slow process of developing machine learning models, data scientists and data engineers need to work together, yet they often work at cross purposes. As ludicrous as it sounds, I’ve seen models take months to get to production because the data scientists were waiting for data engineers to build production systems to suit the model, while the data engineers were waiting for the data scientists to build a model that worked with the production systems.

A previous article by VentureBeat reported that 87% of machine learning projects don’t make it into production, and a combination of data concerns and lack of collaboration were primary factors. On the collaboration side, the tension between data engineers and data scientists — and how they work together — can lead to unnecessary frustration and delays. While team alignment and empathy building can alleviate these tensions, adopting some developing MLOps technologies can help mitigate issues at the root cause. Read More

#devops

Introducing the Model Card Toolkit for Easier Model Transparency Reporting

Machine learning (ML) model transparency is important across a wide variety of domains that impact peoples’ lives, from healthcare to personal finance to employment. The information needed by downstream users will vary, as will the details that developers need in order to decide whether or not a model is appropriate for their use case. This desire for transparency led us to develop a new tool for model transparency, Model Cards, which provide a structured framework for reporting on ML model provenance, usage, and ethics-informed evaluation and give a detailed overview of a model’s suggested uses and limitations that can benefit developers, regulators, and downstream users alike.

Over the past year, we’ve launched Model Cards publicly and worked to create Model Cards for open-source models released by teams across Google. Read More

#big7, #devops, #trust

Google’s TF-Coder tool automates machine learning model design

Researchers at Google Brain, one of Google’s AI research divisions, developed an automated tool for programming in machine learning frameworks like TensorFlow. They say it achieves better-than-human performance on some challenging development tasks, taking seconds to solve problems that take human programmers minutes to hours. Read More

#big7, #devops, #frameworks

A new neural network could help computers code themselves

Computer programming has never been easy. The first coders wrote programs out by hand, scrawling symbols onto graph paper before converting them into large stacks of punched cards that could be processed by the computer. One mark out of place and the whole thing might have to be redone.

Nowadays coders use an array of powerful tools that automate much of the job, from catching errors as you type to testing the code before it’s deployed. But in other ways, little has changed. That’s why some people think we should just get machines to program themselves.

Justin Gottschlich, director of the machine programming research group at Intel, and his colleagues call this machine programming. Read More

#devops, #nlp

Major DevOps Challenges and How to Address Them

The genesis of DevOps comes from the need to break down the silos and get better ownership of the delivered product and better collaboration across teams. It entails two major components of the business space – Development and Operations.

Typically, DevOps is the practice of the development and operations teams working together from the start of the software development lifecycle (SDLC) and through deployment and operations.

…Whether it is aligning the goals and priorities to promote cross-functional team collaboration or shifting older infrastructure models, DevOps poses certain challenges to enterprises. Read More

#devops

Machine Learning for a Better Developer Experience

Imagine having to go through 2.5GB of log entries from a failed software build — 3 million lines — to search for a bug or a regression that happened on line 1M. It’s probably not even doable manually! However, one smart approach to make it tractable might be to diff the lines against a recent successful build, with the hope that the bug produces unusual lines in the logs.

Standard md5 diff would run quickly but still produce at least hundreds of thousands candidate lines to look through because it surfaces character-level differences between lines. Fuzzy diffing using k-nearest neighbors clustering from machine learning (the kind of thing logreduce does) produces around 40,000 candidate lines but takes an hour to complete. Our solution produces 20,000 candidate lines in 20 min of computing — and thanks to the magic of open source, it’s only about a hundred lines of Python code. Read More

#devops

Decision points in storage for artificial intelligence, machine learning and big data

Artificial intelligence and machine learning storage is not one-size-fits-all. Analytics work differs, and has varied storage requirements for capacity, latency, throughput and IOPS. We look at key decision points. Read More

#devops

18 Handy Resources for Machine Learning Practitioners

Machine Learning is a diverse field covering a wide territory and has impacted many verticals. It is able to tackle tasks in language and image processing, anomaly detection, credit scoring sentiment analysis, forecasting alongside dozens of other downstream tasks. A proficient developer, in this line of work; has to be able to draw, borrow, and steal from many adjacent fields such as mathematics, statistics, programming, and most importantly common sense. I for one have drawn tremendous benefits from myriad of tools available to break down complex tasks into smaller more manageable components. It turns out that developing and training a model only takes a small fraction of the project duration. The bulk of the time and resources are spent on data acquisition, preparation, hyperparameter tuning, optimization, and model deployment. I have been successful in building a systematic knowledge base that has helped my team to tackle some common yet tough challenges. Read More

#devops, #mlaas

MLOps with a Feature Store

If AI is to become embedded in the DNA of Enterprise computing systems, Enterprises must first re-align their machine learning (ML) development processes to include data engineers, data scientists and ML engineers in a single automated development, integration, testing, and deployment pipeline. This blog introduces platforms and methods for continuous integration (CI), continuous delivery (CD), and continuous training (CT) with machine learning platforms, with details on how to do CI/CD machine learning operations (MLOps) with a Feature Store. We will see how the Feature Store refactors the monolithic end-to-end ML pipeline into a feature engineering and a model training pipeline. Read More

#devops