Snorkel’s API Documentation

If you’re looking for technical details on Snorkel’s API, you’re in the right place.

For more narrative walkthroughs of Snorkel fundamentals or example use cases, check out our homepage and our tutorials repo.

Snorkel Analysis Package

Generic model analysis utilities shared across Snorkel.

Scorer

Calculate one or more scores from user-specified and/or user-defined metrics.

get_label_buckets

Return data point indices bucketed by label combinations.

get_label_instances

Return instances in x with the specified combination of labels.

metric_score

Evaluate a standard metric on a set of predictions/probabilities.

Snorkel Augmentation Package

Programmatic data set augmentation: TF creation and data generation utilities.

ApplyAllPolicy

Apply all TFs in order to each data point.

ApplyEachPolicy

Apply each TF individually to each data point.

ApplyOnePolicy

Apply a single TF to each data point.

MeanFieldPolicy

Sample sequences of TFs according to a distribution.

PandasTFApplier

TF applier for a Pandas DataFrame.

RandomPolicy

Naive random augmentation policy.

TFApplier

TF applier for a list of data points.

TransformationFunction

Base class for TFs.

transformation_function

Decorate functions to create TFs.

Snorkel Classification Package

PyTorch-based multi-task learning framework for discriminative modeling.

Checkpointer

Manager for checkpointing model.

CheckpointerConfig

Manager for checkpointing model.

DictDataLoader

A DataLoader that uses the appropriate collate_fn for a DictDataset.

DictDataset

A dataset where both the data fields and labels are stored in as dictionaries.

LogManager

A class to manage logging during training progress.

LogManagerConfig

Manager for checkpointing model.

LogWriter

A class for writing logs.

LogWriterConfig

Manager for checkpointing model.

MultitaskClassifier

A classifier built from one or more tasks to support advanced workflows.

Operation

A single operation (forward pass of a module) to execute in a Task.

Task

A single task (a collection of modules and specified path through them).

TensorBoardWriter

A class for logging to Tensorboard during training process.

Trainer

A class for training a MultitaskClassifier.

cross_entropy_with_probs

Calculate cross-entropy loss when targets are probabilities (floats), not ints.

Snorkel Labeling Package

Programmatic data set labeling: LF creation, models, and analysis utilities.

apply.dask.DaskLFApplier

LF applier for a Dask DataFrame.

LFAnalysis

Run analyses on LFs using label matrix.

LFApplier

LF applier for a list of data points (e.g.

model.label_model.LabelModel

A model for learning the LF accuracies and combining their output labels.

LabelingFunction

Base class for labeling functions.

model.baselines.MajorityClassVoter

Majority class label model.

model.baselines.MajorityLabelVoter

Majority vote label model.

lf.nlp.NLPLabelingFunction

Special labeling function type for spaCy-based LFs.

PandasLFApplier

LF applier for a Pandas DataFrame.

apply.dask.PandasParallelLFApplier

Parallel LF applier for a Pandas DataFrame.

model.baselines.RandomVoter

Random vote label model.

apply.spark.SparkLFApplier

LF applier for a Spark RDD.

lf.nlp_spark.SparkNLPLabelingFunction

Special labeling function type for SpaCy-based LFs running on Spark.

filter_unlabeled_dataframe

Filter out examples not covered by any labeling function.

labeling_function

Decorator to define a LabelingFunction object from a function.

lf.nlp.nlp_labeling_function

Decorator to define an NLPLabelingFunction object from a function.

lf.nlp_spark.spark_nlp_labeling_function

Decorator to define a SparkNLPLabelingFunction object from a function.

Snorkel Map Package

Generic utilities for data point to data point operations.

BaseMapper

Base class for Mapper and LambdaMapper.

LambdaMapper

Define a mapper from a function.

Mapper

Base class for any data point to data point mapping in the pipeline.

lambda_mapper

Decorate a function to define a LambdaMapper object.

spark.make_spark_mapper

Convert Mapper to be compatible with PySpark.

Snorkel Preprocess Package

Preprocessors for LFs, TFs, and SFs.

BasePreprocessor

alias of snorkel.map.core.BaseMapper

LambdaPreprocessor

Convenience class for defining preprocessors from functions.

Preprocessor

Base class for preprocessors.

nlp.SpacyPreprocessor

Preprocessor that parses input text via a SpaCy model.

spark.make_spark_preprocessor

Convert Mapper to be compatible with PySpark.

preprocessor

Decorate functions to create preprocessors.

Snorkel Slicing Package

Programmatic data set slicing: SF creation, monitoring utilities, and representation learning for slices.

apply.dask.DaskSFApplier

SF applier for a Dask DataFrame.

sf.nlp.NLPSlicingFunction

Special labeling function type for spaCy-based LFs.

apply.dask.PandasParallelSFApplier

Parallel SF applier for a Pandas DataFrame.

PandasSFApplier

SF applier for a Pandas DataFrame.

SFApplier

SF applier for a list of data points.

SliceAwareClassifier

A slice-aware classifier that supports training + scoring on slice labels.

SliceCombinerModule

A module for combining the weighted representations learned by slices.

SlicingFunction

Base class for slicing functions.

apply.spark.SparkSFApplier

alias of snorkel.labeling.apply.spark.SparkLFApplier

add_slice_labels

Modify a dataloader in-place, adding labels for slice tasks.

convert_to_slice_tasks

Add slice labels to dataloader and creates new slice tasks (including base slice).

sf.nlp.nlp_slicing_function

Decorator to define a NLPSlicingFunction child object from a function.

slice_dataframe

Return a dataframe with examples corresponding to specified SlicingFunction.

slicing_function

Decorator to define a SlicingFunction object from a function.

Snorkel Utils Package

General machine learning utilities shared across Snorkel.

filter_labels

Filter out examples from arrays based on specified labels to filter.

preds_to_probs

Convert an array of predictions into an array of probabilistic labels.

probs_to_preds

Convert an array of probabilistic labels into an array of predictions.

to_int_label_array

Convert an array to a (possibly flattened) array of ints.