Issue #248: Data Labeling quality in focus November 20th 2021

Special Edition - Scaling dataset labeling with quality and speed

Getting quality labeled data, and getting it fast, is often the obstacle to getting good results from experimental models. In order to stay agile, research teams and startups alike need access to the best training data possible in the shortest amount of time.

When lumber prices rose drastically during the Covid-19 pandemic, measuring the weight, dimensions, and quality of lumber became more important than ever before. Measuring, or scaling lumber has historically been done manually. It’s a dangerous process, requiring individuals to work in environments where logs are piled high, people and heavy machinery work in close proximity. TimberEye provides a mobile application that leverages the latest in computer vision and LiDAR mapping technology to enable lumber suppliers as well as buyers to categorize and scale logs up to 20x faster, more safely and consistently, and with better accuracy. Typical human operators will differ from verified measurements by 1.2 - 1.5cm, but the TimberEye app delivered highly accurate measurements with an average difference to verified manual measurements of just 0.3cm.

To continue to enhance the app’s log scaling capabilities, the TimberEye team wanted to experiment with an instance segmentation model. But semantically segmenting images was a laborious and time-consuming process that delayed experimentation.

Scott Gregg, CEO and Founder of TimberEye said that they kept kicking the segmentation work to the back burner, almost abandoning it. But, three days after kicking off the project with Rapid, they had all the data they needed in perfect shape.

Scale was founded to solve the challenge of scaling data labeling pipelines to production-level volumes.

In The News

TimberEye. Enhancing Log Scaling and Inventory Management with Scale Rapid

Faster, safer, and more accurate log scaling. TimberEye provides a mobile application that leverages the latest in computer vision and LiDAR mapping technology to enable lumber suppliers as well as buyers to categorize and scale logs up to 20x faster, more safely and consistently, and with better accuracy.

4 Reasons Why You Need To Outsource Your Data Annotation Project

For a lot of companies, the mere idea of developing a simple AI model could push them to assume they would need millions of dollars to develop it. Oftentimes, they turn out to be true as well. However, every cost that you incur should give you significant returns. That’s the only way you know you’ve invested in something wisely.

Scale AI acquires SiaSearch to give companies the same data engine superpowers that Tesla has

Scale AI has acquired a small startup that could help expand its reach in Europe and speed up development of its newest product. The startup, which is already working with automakers like Volkswagen and Porsche, is able to automatically index and structure the raw sensor data collected by fleets of vehicles.

Applied use cases

Improving Reliability of Solar Power with Data Annotation

The ever increasing efficiency of solar panel technology, combined with improvements in manufacturing, installation, and maintenance mean that this vital renewable power resource is set to become a major component in our energy infrastructure.

SimNet In the Home

We assembled a team of researchers from UC-Berkeley, Google, Amazon and MIT, to ask the question: Can we drastically reduce the cost of data annotation? In response, we have developed SimNet, a low-quality procedural simulator for tackling 3D perception in the wild.

Healthcare

GA4GH: International policies and standards for data sharing across genomic research and healthcare: Cell Genomics

The Global Alliance for Genomics and Health (GA4GH) aims to accelerate biomedical advances by enabling the responsible sharing of clinical and genomic data through both harmonized data aggregation and federated approaches.

AI Helps Humans Annotate Biological Data in Real-Time

A new study published in npj Digital Medicine demonstrates how Human-Augmenting Labeling System (HALS) can reduce the manual work of labeling data by over 90 percent and increase the quality of biological data annotation.

Computer vision

Game Changers: Nature Methods

In this blog post, Rita Strack, Handling Editor for Deep learning enables fast and dense single-molecule localization with high accuracy at Nature Methods, describes how deep learning is used to improve super-resolution microscopy, and explains why the journal plays a part in improving laboratory techniques and methods.

Dense Unsupervised Learning for Video Segmentation

This repository contains the official implementation of our paper: Dense Unsupervised Learning for Video Segmentation. Nikita Araslanov, Simone Schaub-Mayer and Stefan Roth. To appear at NeurIPS*2021

Natural language processing

Continuous Learning in NMT using Bilingual Dictionaries

With a large amount of parallel data, we can achieve impressive translation quality with neural machine translation (NMT). The challenge is how to enable NMT to adapt well to new knowledge that is not covered in the training data.

IBM to Add New Natural Language Processing Enhancements to Watson Discovery

RThese planned updates are designed to help business users in industries such as financial services, insurance and legal services enhance customer care and accelerate business processes by uncovering insights and synthesizing information from complex documents.

Sponsor

Stop Waiting For Your Labels

Scale Rapid is a new product that enables machine learning engineers and researchers to receive high-quality labels and instruction feedback in as little as one hour.

With Scale Rapid you can:

Experiment quickly by setting up labeling projects in minutes and receiving initial datasets within hours.
Iterate over potential edge cases and instructions by getting fast and real-time feedback on labeling instructions and ensure high-quality annotations.
Scale to production-level pipelines in days with precision quality control. No need for lengthy contract negotiations, purchase order, or committed spend.

[View more]