API Reference

Data Cube Warp

cube_builder.utils.processing.merge(merge_file: str, mask: dict, assets: List[dict], band: str, band_map: dict, quality_band: str, collection: str, build_provenance=False, compute=False, native_grid: bool = False, **kwargs)

Apply datacube merge scenes.

The Merge or Warp consists in a procedure that cropping and mosaicking all imagens that superimpose a target tile of common grid, for a specific date.

Data Cube Composition

cube_builder.utils.processing.blend(activity, band_map, quality_band, build_clear_observation=False, block_size=None, reuse_data_cube=None, apply_valid_range=None, **kwargs)

Apply blend and generate raster from activity.

Basically, the blend operation consists in stack all the images (merges) in period. The stack is based in best pixel image (Best clear ratio). The cloud pixels are masked with numpy.ma module, enabling to apply temporal composite function MEDIAN, AVG over these rasters.

The following example represents a data cube Landsat-8 16 days using function Best Pixel (Stack - LCF) and Median (MED) in period of 16 days from 1/1 to 16/1. The images from 10/1 and 15/1 were found and the values as described below:

10/1
Quality                Nir

0 0 2 4         702  876 7000 9000
0 1 1 4         687  444  421 9113      =>  Clear Ratio = 50%
0 2 2 4        1241 1548 2111 1987      =>  Cloud Ratio = 50%

15/1
Quality           Nir
0 0 255 255     854 756 9800 9454
0 1   1   1     945 400  402  422       =>  Clear Ratio ~= 83%
0 0   0   0     869 975  788  799       =>  Cloud Ratio ~= 0%

According to Brazil Data Cube User Guide, the best image is 15/1 (clear ratio ~83%) and worst as 10/1 (50%). The result data cube will be:

Landsat-8_30_16D_LCF
Quality        Nir                     Provenance (Day of Year)

0 0 2 4       854 756 7000 9000      15 15 10 10
0 1 1 1       945 400  411  422      15 15 15 15
0 0 0 0       869 975  788  799      15 15 15 15

Landsat-8_30_16D_MED
Nir

778  816 -9999 -9999
816  422   402   422
1055 975   788   799

Note

When build_clear_observation is set, make sure to do not execute in parallel processing since it is not thread-safe. The provenance band is not generated by MEDIAN products. For pixels nodata in the best image, the cube builder will try to find useful pixel in the next observation. It may be cloud/cloud-shadow (when there is no valid pixel 0 and 1). Otherwise, fill as nodata.

Utils for Image Operation

cube_builder.utils.processing.compute_data_set_stats(file_path: str, mask: dict, compute: bool = True) → Tuple[float, float]

Compute data set efficacy and cloud ratio.

It opens the given file_path and calculate the mask statistics, such efficacy and cloud ratio.

Parameters:

set (file_path - Path to given data) –
name (data_set_name - Data set) –

Returns:

Tuple consisting in efficacy and cloud ratio, respectively.

Define a utility to validate merge images.

class cube_builder.utils.image.QAConfidence(cloud=None, cloud_shadow=None, cirrus=None, snow=None, oli=None)

Type for Quality Assessment definition for Landsat Collection 2.

These properties will be evaluated using Python Virtual Machine like:

# Define that will discard all cloud values which has confidence greater or equal MEDIUM.
qa = QAConfidence(cloud='cloud >= MEDIUM', cloud_shadow=None, cirrus=None, snow=None)

apply(data)

Apply the Bit confidence to the Quality Assessment mask.

Parameters:: data (numpy.ma.MaskedArray) – The Masked raster QA Pixel
Returns:: numpy.ma.MaskedArray - The masked pixels with satisfy the confidence attributes.

cirrus: str | None: Represent the Cirrus.

cloud: str | None: Represent the Cloud Confidence.

cloud_shadow: str | None: Represent the Cloud Shadow Confidence.

oli: bool | ndarray: Flag to identify OLI/TIRS Satellite.

snow: str | None: Represent the Snow/Ice.

class cube_builder.utils.image.SmartDataSet(file_path: str, mode='r', tags=None, **properties)

Defines utility class to auto close rasterio data set.

This class is class helper to avoid memory leak of opened data set in memory.

close(): Close rasterio data set.

cube_builder.utils.image.check_file_integrity(file_path: str | Path, read_bytes: bool = False) → bool

Check Raster File integrity.

Parameters:

file_path (str|Path) – Path to the raster file
read_bytes (bool) – Read raster band entire. Default is False

cube_builder.utils.image.create_empty_raster(location: str, proj4: str, dtype: str, xmin: float, ymax: float, resolution: List[float], dist: List[float], nodata: float, cog=True)

Create an data set filled out with nodata.

This method aims to solve the problem to generate an empty scene to make sure in order to follow the data cube timeline.

Parameters:

location (str) – Path where file will be generated.
proj4 (str) – Proj4 with Coordinate Reference System.
dtype (str) – Data type
xmin (float) – Image minx (Related to geotransform)
ymax (float) – Image ymax
resolution (List[float]) – Pixel resolution (X, Y)
dist (List[float]) – The distance of X, Y (Scene offset)
nodata (float) – Scene nodata.
cog (bool) – Flag to generate datacube. Default is True.

cube_builder.utils.image.extract_qa_bits(band_data, bit_location)

Retrieve the bit information from given position.

Parameters:

band_data (int|numpy.ma.masked_array) –
bit_location (int) –

cube_builder.utils.image.generate_cogs(input_data_set_path, file_path, profile='deflate', block_size=None, profile_options=None, **options)

Generate Cloud Optimized GeoTIFF files (COG).

Parameters:

input_data_set_path (str) –
file_path (str) –
profile (str) –
profile_options (dict) –
block_size (int) –

Returns:

Path to COG.

cube_builder.utils.image.get_qa_mask(data: MaskedArray, clear_data: List[float] | None = None, not_clear_data: List[float] | None = None, nodata: float | None = None, confidence: QAConfidence | None = None) → MaskedArray

Extract Quality Assessment Bits from Landsat Collection 2 Level-2 products.

This method uses the bitwise operation to extract bits according to the document Landsat 8 Collection 2 (C2) Level 2 Science Product (L2SP) Guide, page 13.

Note

This method supports the products Landsat-4 (or more) Collection 2 Science products. Please take a look into cube_builder.utils.image.QAConfidence if you are dealing with multiple sensors.

Example

>>> import numpy
>>> from cube_builder.utils.image import QAConfidence, get_qa_mask

>>> mid_cloud_confidence = QAConfidence(cloud='cloud == MEDIUM', cloud_shadow=None, cirrus=None, snow=None, oli=True)
>>> clear = [6, 7]  # Clear and Water
>>> not_clear = [1, 2, 3, 4]  # Dilated Cloud, Cirrus, Cloud, Cloud Shadow
>>> get_qa_mask(numpy.ma.array([22080], dtype=numpy.int16, fill_value=1),
...             clear_data=clear, not_clear_data=not_clear,
...             nodata=1, confidence=mid_cloud_confidence)
masked_array(data=[--],
             mask=[ True],
       fill_value=1,
            dtype=int16)
>>> # When no cloud confidence set, this value will be Clear since Cloud Pixel is off.
>>> get_qa_mask(numpy.ma.array([22080], dtype=numpy.int16, fill_value=1),
...             clear_data=clear, not_clear_data=not_clear,
...             nodata=1)
masked_array(data=[22080],
             mask=[False],
       fill_value=1,
            dtype=int16)

Parameters:

data (numpy.ma.masked_array) – The QA Raster Data
clear_data (List[float]) – The bits values to be considered as Clear. Default is [].
not_clear_data (List[float]) – The bits values to be considered as Not Clear Values (Cloud,Shadow, etc).
nodata (float) – Pixel nodata value.
confidence (QAConfidence) – The confidence rules mapping. See more in QAConfidence.

Returns:

An array which the values represent clear_data and the masked values represents not_clear_data.

Return type:

numpy.ma.masked_array

cube_builder.utils.image.get_resample_method(name: str) → Resampling

Retrieve a resampling method from name.

Note

This method uses rasterio.warp.Resampling.

Parameters:: name – The resampling name

cube_builder.utils.image.linear_raster_scale(array: ndarray | MaskedArray, input_range: Tuple[int, int], output_range: Tuple[int, int] = (0, 255)) → ndarray | MaskedArray

Clip the values in an array and apply linear rescaling.

Note

This function is compatible with numpy.ma module

Parameters:

array (ArrayType) – Input raster
input_range – The array min and max values
output_range – The output min and max values to rescale to. Defaults to 0, 255.

Returns:

scaled array (in float)

Return type:

ArrayType

cube_builder.utils.image.match_histogram_with_merges(source: str, source_mask: str, reference: str, reference_mask: str, **kwargs)

Normalize the source image histogram with reference image.

This functions implements the skimage.exposure.match_histograms, which consists in the manipulate the pixels of an input image and match the histogram with the reference image.

See more in Histogram Matching.

Note

It overwrites the source file.

Parameters:

source (str) – Path to the rasterio data set file
source_mask (str) – Path to the rasterio data set file
reference (str) – Path to the rasterio data set file
reference_mask (str) – Path to the rasterio data set file

cube_builder.utils.image.radsat_extract_bits(bit_value: int | ndarray, bit_start: int, bit_end: int | None = None)

Extract bitwise values from image.

This method uses the bitwise operation to identify pixel saturation. According to the document LaSRC Product Guide, the Landsat Radiometric Saturation Quality Assessment Band (radsat_qa) is a bit packed representation of which sensor bands were saturated during data sensing capture. The value 1 represents saturated value while 0 is valid data. For Landsat-8, the following table represents pixels saturation:

Bit    Bit Value    Description
      1        Data Fill Flag
      2        Band 1 Data Saturation Flag
      4        Band 2 Data Saturation Flag
      8        Band 3 Data Saturation Flag
     16        Band 4 Data Saturation Flag
     32        Band 5 Data Saturation Flag
     64        Band 6 Data Saturation Flag
    128        Band 7 Data Saturation Flag
    256        Band 8 Data Saturation Flag
    512        Band 9 Data Saturation Flag
   1024        Band 10 Data Saturation Flag
   2048        Band 11 Data Saturation Flag

Example

>>> from cube_builder.utils.image import radsat_extract_bits
>>> # Represents band 10 (1024) and band 1 (2) is saturated.
>>> # Check if any band is saturated
>>> radsat_extract_bits(1026, 1, 7)
1
>>> # You can also pass the numpy array
>>> # radsat_extract_bits(numpy.random.randint(0, 1028, size=(100, 100)), 1, 7)

cube_builder.utils.image.raster_convexhull(imagepath: str, epsg='EPSG:4326') → dict

Get a raster image footprint.

Parameters:

imagepath (str) – image file
epsg (str) – geometry EPSG

See:: https://rasterio.readthedocs.io/en/latest/topics/masks.html

cube_builder.utils.image.raster_extent(imagepath: str, epsg='EPSG:4326') → Polygon

Get raster extent in arbitrary CRS.

Parameters:

imagepath (str) – Path to image
epsg (str) – EPSG Code of result crs

Returns:

geojson-like geometry

Return type:

dict

cube_builder.utils.image.rescale(array: ndarray | MaskedArray, multiplier: float, new_scale: float, origin_additive: float = 0, dtype=None)

Rescale an array into new range.

To prevent any data loss or invalid data while casting, both maximum and minimum values of dtype will be set when overflow limits.

Tip

When dealing with negative origin_additive factor or values which may be negative, make sure to use right numpy dtype and Numpy Masked Arrays to mask nodata values to avoid value limit coercion.

Note

When the result value overflow the data type, the value is coerced to the data type limits. See more in numpy.iinfo.

Parameters:

array – Input array
multiplier – Origin array scale multiplier
new_scale – Target scale factor.
origin_additive – Origin additive factor
dtype – New data type for casting. Default is original array.

Examples

This example covers the rescaling Landsat Collection 2 arrays (1-65535, scale=0.0000275 - 0.2) into 0-10000 values.

 >>> import numpy
 >>> from cube_builder.utils.image import rescale
 >>> arr3d = numpy.random.randint(1, 65535, (3, 3), dtype=numpy.uint16)
 >>> rescale(arr3d, 0.0000275, new_scale=0.0001, origin_additive=-0.2)
 array([[15065.675,  6057.5  ,  2893.075],
        [ 3523.375, 14349.3  ,  9762.3  ],
        [14221.425, -1438.725,  -363.75 ]])

cube_builder.utils.image.save_as_cog(destination: str, raster, mode='w', tags=None, block_size=None, **profile)

Save the raster file as Cloud Optimized GeoTIFF.

Band Index Generator

Simple data cube band generator.

cube_builder.utils.index_generator.BandMapFile

Type which a key (represented as data cube band name) points to generated file in disk.

alias of Dict[str, str]

cube_builder.utils.index_generator.generate_band_indexes(cube: Collection, scenes: dict, period: str, tile_id: str, reuse_data_cube: Collection | None = None, **kwargs) → Dict[str, str]

Generate data cube custom bands based in string-expression on table band_indexes.

This method seeks for custom bands on Collection Band definition. A custom band must have metadata property filled out according the bdc_catalog.jsonschemas.band-metadata.json.

Note

When collection does not have any index band, returns empty dict.

Raises:: RuntimeError – when an error occurs while interpreting the band expression in Python Virtual Machine.
Returns:: A dict values with generated bands.

Tasks

Define a structure component to run celery worker.

cube_builder.celery.worker.load_models(*args, **kwargs): Load celery models when worker is ready.

Define celery tasks for Cube Builder.

The processing workflow consists in:

Search    ->    Merge    ->    prepare_blend    ->    blend    ->    publish

cube_builder.celery.tasks.warp_merge(activity, band_map, mask, force=False, data_dir=None, **kwargs)

Execute datacube merge task.

This task consists in the following steps:

Prepare a raster using dimensions of datacube GRS schema.

Open collection dataset with RasterIO and reproject to datacube GRS Schema.

Fill the respective path row into raster

Parameters:

Model (activity - Datacube Activity) –
cache. (force - Flag to build data cube without) –

Returns:

Validated activity

cube_builder.celery.tasks.prepare_blend(merges, band_map: dict, reuse_data_cube=None, **kwargs)

Receive merges by period and prepare task blend.

This task aims to prepare celery task definition for blend. A blend requires both data set quality band and others bands. In this way, we must group these values by temporal resolution and then schedule blend tasks.

cube_builder.celery.tasks.blend(activity, band_map, build_clear_observation=False, reuse_data_cube=None, **kwargs)

Execute datacube blend task.

Parameters:

Model. (activity - Datacube Activity) –
name. (band_map - Band mapping with common_name and band original) –
Observation". (build_clear_observation - Generate band "Clear) –

Returns:

Validated activity

cube_builder.celery.tasks.publish(blends, band_map, quality_band: str, reuse_data_cube=None, **kwargs)

Execute publish task and catalog datacube result.

Parameters:: Model (activity - Datacube Activity) –

cube_builder.celery.tasks.capture_traceback(exception=None): Retrieve stacktrace as string.

cube_builder.celery.tasks.create_execution(activity: dict) → Activity

Create cube-builder activity and prepare celery execution.

Parameters:: dict (activity - Cube Builder Activity) –
Returns:: Activity the cube build activity model