Reading project labels#

Use this script to read labels from your Encord project.

Imports and authentication#

11 from dataclasses import dataclass
12 from functools import partial
13 from pathlib import Path
14 from typing import Callable, Generator, List, Optional
15
16 from encord import EncordUserClient
17 from encord.orm.project import Project as OrmProject
18 from encord.project import Project
19 from encord.project_ontology.object_type import ObjectShape

Note

To interact with Encord, you need to authenticate a client. You can find more details here.

29 # Authentication: adapt the following line to your private key path
30 private_key_path = Path.home() / ".ssh" / "id_ed25519"
31
32 with private_key_path.open() as f:
33     private_key = f.read()
34
35 user_client = EncordUserClient.create_with_ssh_private_key(private_key)
36
37 # Find project to work with based on title.
38 project_orm: OrmProject = next(
39     (
40         p["project"]
41         for p in user_client.get_projects(title_eq="The title of the project")
42     )
43 )
44 project: Project = user_client.get_project(project_orm.project_hash)

1. The high-level view of your labels#

Project data is grouped into label_rows, which point to individual image groups or videos. Each label row will have its own label status, as not all label rows may be annotated at a given point in time.

Here is an example of listing the label status of a label row:

55 # Fetch one label row as an example.
56 for label_row in project.label_rows:
57     print(label_row)
58     break

Expected output:

{
   "label_hash": "<label_hash>",  # or None
   "data_hash": "<data_hash>",
   "dataset_hash": "<dataset_hash>",
   "data_title": "<data_title>",
   "data_type": "IMG_GROUP",  # or VIDEO
   "label_status": "NOT_LABELLED",
   "annotation_task_status": "ASSIGNED"
}

From the high-level data, you can, for example, compute some statistics of the progress of your annotators:

78 status_counts = {}
79 for label_row in project.label_rows:
80     status = label_row["annotation_task_status"]
81     status_counts[status] = status_counts.setdefault(status, 0) + 1
82 print(status_counts)

Expected output:

{'RETURNED': 1, 'COMPLETED': 3, 'QUEUED': 20, 'IN_REVIEW': 3, 'ASSIGNED': 1}

2. Getting all label details#

The actual labels in the label rows are fetched by EncordClientProject.get_label_row(). This function will return a nested dictionary structure, with all details about classifications as well as objects. In this section, we show how to build a list of all bounding boxes that have been reviewed and marked as approved.

First, we define a data class to hold the information of interest.

102 @dataclass(frozen=True)
103 class AnnotationObject:
104     object: dict
105     file_name: str
106     data_url: str
107     frame: Optional[int] = None

Then we define a function which iterates over all objects of a label row fetched with EncordClientProject.get_label_row(). The function has a callable argument used to filter which objects should be returned.

116 def iterate_over_objects(
117     label_row_details,
118     include_object_fn: Callable[[dict], bool],
119 ) -> Generator[AnnotationObject, None, None]:
120     """
121     Iterate over objects in a label row.
122
123     :param label_row: the detailed label row to fetch objects from
124     :param include_object_fn: A callable indicating whether to include an object.
125     :return: Yields AnnotationObjects.
126     """
127     if label_row["data_type"] == "IMG_GROUP":
128         # Image groups have multiple data_units (one for each image)
129         for data_unit in label_row_details["data_units"].values():
130             url = data_unit["data_link"]
131             file_name = data_unit["data_title"]
132             objects = data_unit["labels"]["objects"]
133             for object in objects:
134                 if include_object_fn(object):
135                     yield AnnotationObject(object, file_name, url)
136
137     else:
138         # Videos have a single data unit, but multiple frames.
139         # Need to iterate through frames instead.
140         data_unit = list(label_row_details["data_units"].values())[0]
141
142         url = data_unit["data_link"]
143         file_name = data_unit["data_title"]
144         for frame, labels in data_unit["labels"].items():
145             for object in labels["objects"]:
146                 if include_object_fn(object):
147                     yield AnnotationObject(object, file_name, url, frame)

Then we can define a function, which is used to choose which objects to include.

154 def include_object_fn_base(
155     object: dict,
156     object_type: Optional[ObjectShape] = None,
157     only_approved: bool = True,
158 ):
159     # Filter object type
160     if object_type and object["shape"].lower() != object_type.value.lower():
161         return False
162
163     # Filter reviewed status
164     if (
165         only_approved
166         and not object["reviews"]
167         or not object["reviews"][-1]["approved"]
168     ):
169         return False
170
171     return True
172
173
174 # Trick to preselect object_type
175 include_object_fn_bbox: Callable[[dict], bool] = partial(
176     include_object_fn_base, object_type=ObjectShape.BOUNDING_BOX
177 )

Now we can use the iterator and the filter to collect the objects.

182 reviewed_bounding_boxes: List[AnnotationObject] = []
183 for label_row in project.label_rows:
184     if not label_row["label_hash"]:  # No objects in this label row yet.
185         continue
186
187     label_row_details = project.get_label_row(label_row["label_hash"])
188     reviewed_bounding_boxes += list(
189         iterate_over_objects(label_row_details, include_object_fn_bbox)
190     )
191
192 print(reviewed_bounding_boxes)

Expected output:

[
    AnnotationObject(
        object={
            "name": "Name of the object annotated",
            "color": "#FE9200",
            "shape": "bounding_box",
            "value": "name_of_the_object_annotated",
            "createdAt": "Wed, 18 May 2022 14:07:14 GMT",
            "createdBy": "annotator1@your.domain",
            "confidence": 1,
            "objectHash": "<object_hash>",
            "featureHash": "<feature_hash>",
            "manualAnnotation": True,
            "boundingBox": {
                "h": 0.8427,
                "w": 0.5857,
                "x": 0.3134,
                "y": 0.1059,
            },
            "reviews": [
                {
                    "exists": True,
                    "comment": None,
                    "approved": True,
                    "instance": {
                        "name": "nested_classifications",
                        "range": [[0, 0]],
                        "shape": "bounding_box",
                        "objectHash": "<object_hash>",
                        "featureHash": "<feature_hash>",
                        "classifications": [],
                    },
                    "createdAt": "Wed, 18 May 2022 14:07:42 GMT",
                    "createdBy": "reviewer1@your.domain",
                    "rejections": None,
                }
            ],
        },
        file_name="your_file_name.jpg",
        frame=None,  # or a number if video annotation,
        data_url="<signed_link_to_data>",
    )
    # ...
]

From this template, it is possible to extract various subsets of objects by changing arguments to the include_object_fn_base. For example, getting all polygons is done by changing the object_type argument to ObjectShape.POLYGON. Similarly, you can define your own filtering function to replace include_object_fn_base to select only the objects that you need for your purpose. Finally, if you want to get classifications rather than objects, you will have to change the "objects" dictionary lookups (line 129 and 141) to "classifications" and compose a new filtering function.

3. Fetching nested classifications#

It is possible to make nested classifications on objects. The information about such nested classifications is stored in the classification_answers, object_answers and object_actions sections of the label_row_details.

Assuming that the reviewed bounding boxes fetched above have nested attributes, the following code example shows how to get the nested classification information.

263 print(
264     label_row_details["object_answers"][
265         reviewed_bounding_boxes[-1].object["objectHash"]
266     ]
267 )

Expected output:

Expected output:
{
    'classifications': [
        {
            'answers':
                [
                    {
                        'featureHash': '<nested_feature_hash2>',
                        'name': 'nested option 1',
                        'value': 'nested_option_1'
                    }
                ],
            'featureHash': '<nested_feature_hash>',
            'manualAnnotation': True,
            'name': 'Nested classification.',
            'value': 'nested_classification.'
        }
    ],
    'objectHash': 'e413a414'
}

Gallery generated by Sphinx-Gallery