Note
Click here to download the full example code
DEPRECATED - Reading project labels#
This tutorial introduces a deprecated script to read labels from you Encord project. You are encouraged to use the tools introduced in the Working with the LabelRowV2 section instead.
Imports and authentication#
from dataclasses import dataclass
from functools import partial
from pathlib import Path
from typing import Callable, Generator, List, Optional
from encord import EncordUserClient
from encord.orm.project import Project as OrmProject
from encord.project import Project
from encord.project_ontology.object_type import ObjectShape
Note
To interact with Encord, you need to authenticate a client. You can find more details here.
# Authentication: adapt the following line to your private key path
private_key_path = Path.home() / ".ssh" / "id_ed25519"
with private_key_path.open() as f:
private_key = f.read()
user_client = EncordUserClient.create_with_ssh_private_key(private_key)
# Find project to work with based on title.
project_orm: OrmProject = next(
(
p["project"]
for p in user_client.get_projects(title_eq="The title of the project")
)
)
project: Project = user_client.get_project(project_orm.project_hash)
1. The high-level view of your labels#
Project data is grouped into label_rows, which point to individual image groups or videos. Each label row will have its own label status, as not all label rows may be annotated at a given point in time.
Here is an example of listing the label status of a label row:
# Fetch one label row as an example.
for label_row in project.label_rows:
print(label_row)
break
Expected output:
{
"label_hash": "<label_hash>", # or None
"data_hash": "<data_hash>",
"dataset_hash": "<dataset_hash>",
"data_title": "<data_title>",
"data_type": "IMG_GROUP", # or VIDEO
"label_status": "NOT_LABELLED",
"annotation_task_status": "ASSIGNED"
}
From the high-level data, you can, for example, compute some statistics of the progress of your annotators:
status_counts = {}
for label_row in project.label_rows:
status = label_row["annotation_task_status"]
status_counts[status] = status_counts.setdefault(status, 0) + 1
print(status_counts)
Expected output:
{'RETURNED': 1, 'COMPLETED': 3, 'QUEUED': 20, 'IN_REVIEW': 3, 'ASSIGNED': 1}
2. Getting all label details#
The actual labels in the label rows are fetched by
EncordClientProject.get_label_row()
. This function will return a nested
dictionary structure, with all details about classifications as well as objects.
In this section, we show how to build a list of all bounding boxes that have been
reviewed and marked as approved.
First, we define a data class to hold the information of interest.
@dataclass(frozen=True)
class AnnotationObject:
object: dict
file_name: str
data_url: str
frame: Optional[int] = None
Then we define a function which iterates over all objects of a label row fetched with
EncordClientProject.get_label_row()
. The function has a callable argument
used to filter which objects should be returned.
def iterate_over_objects(
label_row_details,
include_object_fn: Callable[[dict], bool],
) -> Generator[AnnotationObject, None, None]:
"""
Iterate over objects in a label row.
:param label_row: the detailed label row to fetch objects from
:param include_object_fn: A callable indicating whether to include an object.
:return: Yields AnnotationObjects.
"""
if label_row["data_type"] == "IMG_GROUP":
# Image groups have multiple data_units (one for each image)
for data_unit in label_row_details["data_units"].values():
url = data_unit["data_link"]
file_name = data_unit["data_title"]
objects = data_unit["labels"]["objects"]
for object in objects:
if include_object_fn(object):
yield AnnotationObject(object, file_name, url)
else:
# Videos have a single data unit, but multiple frames.
# Need to iterate through frames instead.
data_unit = list(label_row_details["data_units"].values())[0]
url = data_unit["data_link"]
file_name = data_unit["data_title"]
for frame, labels in data_unit["labels"].items():
for object in labels["objects"]:
if include_object_fn(object):
yield AnnotationObject(object, file_name, url, frame)
Then we can define a function, which is used to choose which objects to include.
def include_object_fn_base(
object: dict,
object_type: Optional[ObjectShape] = None,
only_approved: bool = True,
):
# Filter object type
if object_type and object["shape"].lower() != object_type.value.lower():
return False
# Filter reviewed status
if (
only_approved
and not object["reviews"]
or not object["reviews"][-1]["approved"]
):
return False
return True
# Trick to preselect object_type
include_object_fn_bbox: Callable[[dict], bool] = partial(
include_object_fn_base, object_type=ObjectShape.BOUNDING_BOX
)
Now we can use the iterator and the filter to collect the objects.
reviewed_bounding_boxes: List[AnnotationObject] = []
for label_row in project.label_rows:
if not label_row["label_hash"]: # No objects in this label row yet.
continue
# Only set the `include_reviews` flag to `True` if the reviews payload is needed.
label_row_details = project.get_label_row(
label_row["label_hash"], include_reviews=True
)
reviewed_bounding_boxes += list(
iterate_over_objects(label_row_details, include_object_fn_bbox)
)
print(reviewed_bounding_boxes)
Expected output:
[
AnnotationObject(
object={
"name": "Name of the object annotated",
"color": "#FE9200",
"shape": "bounding_box",
"value": "name_of_the_object_annotated",
"createdAt": "Wed, 18 May 2022 14:07:14 GMT",
"createdBy": "annotator1@your.domain",
"confidence": 1,
"objectHash": "<object_hash>",
"featureHash": "<feature_hash>",
"manualAnnotation": True,
"boundingBox": {
"h": 0.8427,
"w": 0.5857,
"x": 0.3134,
"y": 0.1059,
},
"reviews": [
{
"exists": True,
"comment": None,
"approved": True,
"instance": {
"name": "nested_classifications",
"range": [[0, 0]],
"shape": "bounding_box",
"objectHash": "<object_hash>",
"featureHash": "<feature_hash>",
"classifications": [],
},
"createdAt": "Wed, 18 May 2022 14:07:42 GMT",
"createdBy": "reviewer1@your.domain",
"rejections": None,
}
],
},
file_name="your_file_name.jpg",
frame=None, # or a number if video annotation,
data_url="<signed_link_to_data>",
)
# ...
]
From this template, it is possible to extract various subsets of objects by changing
arguments to the include_object_fn_base
. For example, getting all polygons is done
by changing the object_type
argument to ObjectShape.POLYGON
.
Similarly, you can define your own filtering function to replace
include_object_fn_base
to select only the objects that you need for your purpose.
Finally, if you want to get classifications rather than objects, you will have to
change the "objects"
dictionary lookups (line 129 and 141) to
"classifications"
and compose a new filtering function.
3. Fetching nested classifications#
It is possible to make nested classifications on objects. The information about such
nested classifications is stored in the classification_answers
, object_answers
and object_actions
sections of the label_row_details
.
Assuming that the reviewed bounding boxes fetched above have nested attributes, the following code example shows how to get the nested classification information.
print(
label_row_details["object_answers"][
reviewed_bounding_boxes[-1].object["objectHash"]
]
)
Expected output:
Expected output:
{
'classifications': [
{
'answers':
[
{
'featureHash': '<nested_feature_hash2>',
'name': 'nested option 1',
'value': 'nested_option_1'
}
],
'featureHash': '<nested_feature_hash>',
'manualAnnotation': True,
'name': 'Nested classification.',
'value': 'nested_classification.'
}
],
'objectHash': 'e413a414'
}