The role of academia in data science education

I was recently asked to moderate an academic panel on the role of universities in training the data science workforce. I preceded each question with opinionated introductions which I have fused into this blog post. These are weakly held opinions so please consider commenting if you disagree with anything.

To discuss data science education we first need to clearly state what it means. The panel organizers defined data science as “an emerging discipline that draws upon knowledge in statistical methodology and computer science to create impactful predictions and insights for a wide range of traditional scholarly fields.“ But is it an academic discipline? If so, what are the shared fundamental principles, expertise, skills, and knowledge-based shared by data scientists? Is there a core curriculum for Data Science? Providing a more detailed definition might help. Read More

#universities

The Growing Marketplace For AI Ethics

As companies have raced to adopt artificial intelligence (AI) systems at scale, they have also sped through, and sometimes spun out, in the ethical obstacle course AI often presents.

AI-powered loan and credit approval processes have been marred by unforeseen bias. Same with recruiting tools. Smart speakers have secretly turned on and recorded thousands of minutes of audio of their owners.

Unfortunately, there’s no industry-standard, best-practices handbook on AI ethics for companies to follow—at least not yet. Some large companies, including Microsoft and Google, are developing their own internal ethical frameworks.

A number of think tanks, research organizations, and advocacy groups, meanwhile, have been developing a wide variety of ethical frameworks and guidelines for AI. Below is a brief roundup of some of the more influential models to emerge—from the Asilomar Principles to best-practice recommendations from the AI Now Institute. Read More

#ethics

Statistical Significance Tests for Comparing Machine Learning Algorithms

Comparing machine learning methods and selecting a final model is a common operation in applied machine learning.

Models are commonly evaluated using resampling methods like k-fold cross-validation from which mean skill scores are calculated and compared directly. Although simple, this approach can be misleading as it is hard to know whether the difference between mean skill scores is real or the result of a statistical fluke.

Statistical significance tests are designed to address this problem and quantify the likelihood of the samples of skill scores being observed given the assumption that they were drawn from the same distribution. If this assumption, or null hypothesis, is rejected, it suggests that the difference in skill scores is statistically significant.

Although not foolproof, statistical hypothesis testing can improve both your confidence in the interpretation and the presentation of results during model selection. Read More

#accuracy

20 Best YouTube Channels for AI and Machine Learning

Two Minute Papers: https://www.youtube.com/channel/UCbfYPyITQ-7l4upoX8nvctg

Arxiv Insights: https://www.youtube.com/channel/UCNIkB2IeJ-6AmZv7bQ1oBYg/featured

The Artificial Intelligence Channel: https://www.youtube.com/channel/UC5g-f-g4EVRkqL8Xs888BLA

Subscribe to these and other YouTube channels today for AI, machine learning, and computer science tutorial videos.

We recommend these YouTube channels regardless of your machine learning experience, whether you have a computer science degree or just a passing interest in AI. You’ll soon be on the way toward mastering the basics of AI, machine learning, and computer science in no time, through easy-to-follow demos and tutorial videos. Read More

#videos

The global trend of platformication

Read More

#ai-first, #strategy, #videos

The API Economy — Disruption and the Business of APIs

In 2006, the most predominant form of digital social communication was still email, and AOL instant messenger. A decade later, things have obviously changed quite rapidly in the face of higher bandwidth, greater capabilities, and the explosion of social media. Software-as-a-Service is an area that is growing exponentially. The digital platformification of older industries, paired with new advances in the Internet of Things (IoT) makes the API space — the powerhouse driving Internet connectivity — ripe for investment.

APIs, or Application Programming Interfaces, are an important cog in this process, and the market surrounding them is thriving. As John Musser of API Science told us, their future ubiquity throughout our digital fabric is inevitable. APIs encourage standardization — have you ever used Twitter to log in to a third party application? They extend functionality so that more potential is at our fingertips — how often do you query a map embedded into a web application? By exposing assets to developers to create new apps with, APIs also inspire innovation, promote data matter experts, lead to creative projects, and subtly increase the end user’s experience.

The API space has produced an economy in it’s own right. APIs can be used to open new monetization streams alongside existing ones, but API-first companies have emerged that are entirely built around an API service. Twillio, Algolia, Contentful.com, and others are examples of companies that are exposing an API as their main product Read More

#ai-first, #strategy

Six Ways To Prepare Your Team For A Digital Transformation

1. Continually update your mindset to demystify changes.
2. Take a look at what could be automated.
3. Determine a strategic approach to reskilling.
4. Address new job requirements with innovative hiring practices.
5. Consider utilizing microlearning tools.
6. Build a culture that supports ongoing skill evolution.

Read More

#strategy

I gotta basketball Jones!

Read More

#robotics, #videos

Visipedia

Visipedia, short for “Visual Encyclopedia,” is a network of people and machines that is designed to harvest  and organize visual information and make it accessible to anyone anywhere. Visipedia machines can learn from experts how to discover and classify animals, plants and objects in images. Communities of scientists and interested citizens may use Visipedia software to share, annotate and organize meaningful content in images. Recent experiments include software that can detect and classify trees from satellite and street-level images, and an app that can recognize North American birds. Visipedia is a joint project between Pietro Perona’s Vision Group at Caltech and Serge Belongie’s Vision Group at Cornell Tech. Read More

#universities, #vision

Building a bird recognition app and large scale dataset with citizen scientists: The fine print in fine-grained dataset collection

We introduce tools and methodologies to collect high quality, large scale fine-grained computer vision datasets using citizen scientists – crowd annotators who are passion-ate and knowledgeable about specific domains such as birds or airplanes. We worked with citizen scientists and domain experts to collect NABirds, a new high quality dataset containing 48,562 images of North American birds with 555 categories, part annotations and bounding boxes. We find that citizen scientists are significantly more accurate than Mechanical Turkers at zero cost. We worked with bird experts to measure the quality of popular datasets like CUB-200-2011 and ImageNet and found class label error rates of at least4%. Nevertheless, we found that learning algorithms are surprisingly robust to annotation errors and this level of training data corruption can lead to an acceptably small increase in test error if the training set has sufficient size. At the same time, we found that an expert-curated high quality test set like NABirds is necessary to accurately measure the performance of fine-grained computer vision systems. We used NABirds to train a publicly available bird recognition service deployed on the web site of the Cornell Lab of Ornithology.1 Read More

#universities, #vision