On this page you'll find explanations of all the key concepts and entities on the Humanloop platform. These should provide an intuitive way for thinking about your NLP project from inception to delivery.
Within the Humanloop platform, a project is a combination of datasets, annotators, and usually a model. When you create a project, you designate one or more inputs and an output. These tell the platform what shape your data will be in.
A dataset is just a collection of datapoints, suitable for use with training an NLP model. Each datapoint contains multiple fields - imagine rows of a spreadsheet. For example, a sentiment analysis dataset of movie reviews might have two fields, 'text' and 'sentiment', with 'text' corresponding to the review itself while 'sentiment' indicates the label.
During project creation, one-or-more datasets can be associated with the project (they can also be added or removed at a later point.) In this way you can incrementally build up your model based on new information or changes to the problem. This is handled via the project's inputs and outputs - it's possible to map from differently-named columns within distinct datasets to the same project input. This allows for flexible dataset specification or configuration changes.
Annotators are the people involved in your project, either by annotating data directly or as part of a review process. Projects start out with just one annotator (the project owner), but additional annotators can be invited via email address.
Annotators typically create annotations via the annotation interface, which also allows them to leave comments for other users or flag particular datapoints as in need of investigation.
Humanloop allows for each datapoint within a project to be annotated one-or-more times. To capture that, we have the concept of a task, which is a pair of a single datapoint and an annotator. For example, a project with 500 datapoints all to be annotated by 3 annotators would have 1500 tasks to complete.
Task also record status information about annotation work against a particular datapoint. A complete task is one that the annotator has marked as having all relevant annotation information created. Completed tasks are used in model training and evaluation.
Task creation is governed by a project's configuration. In the settings of a project it is possible to choose what fraction of the data should be annotated, and to reassign annotation work between annotators.
For example, the below image shows tasks created and allocated in a project with 2 annotators where 50% of the data should be doubly-annotated:
Models are often - but not always - the desired result of a Humanloop project. Using labeled data provided by the annotators, a machine learning model is trained which is capable of predicting labels on unseen examples.
Model training within the Humanloop platform happens without the need for user intervention. While human annotation is still ongoing, as soon as the model's predictions reach a quality benchmark they are made available in the annotation interface to speed up the work of the labeling. Based on the model's performance, the most useful training examples are presented to annotators first, so that higher model performance can be reached faster.
The Humanloop platform currently supports training models for the following tasks:
- Multilabel classification
- Sequence tagging
Once a model is available, it's possible to invoke it via API to integrate it into complex machine learning workflows. This API integration can be tested within a project's page.