API Reference
Data Cube Warp
- cube_builder.utils.processing.merge(merge_file: str, mask: dict, assets: List[dict], band: str, band_map: dict, quality_band: str, collection: str, build_provenance=False, compute=False, native_grid: bool = False, **kwargs)
Apply datacube merge scenes.
The Merge or Warp consists in a procedure that cropping and mosaicking all imagens that superimpose a target tile of common grid, for a specific date.
- Parameters:
merge_file – Path to store data cube merge
assets – List of collections assets during period
band – Merge band name
band_map – Map of cube band name and common name.
build_provenance – Build a provenance file for Merge (Used in combined collections)
**kwargs – Extra properties
Data Cube Composition
- cube_builder.utils.processing.blend(activity, band_map, quality_band, build_clear_observation=False, block_size=None, reuse_data_cube=None, apply_valid_range=None, **kwargs)
Apply blend and generate raster from activity.
Basically, the blend operation consists in stack all the images (merges) in period. The stack is based in best pixel image (Best clear ratio). The cloud pixels are masked with numpy.ma module, enabling to apply temporal composite function MEDIAN, AVG over these rasters.
The following example represents a data cube Landsat-8 16 days using function Best Pixel (Stack - LCF) and Median (MED) in period of 16 days from 1/1 to 16/1. The images from 10/1 and 15/1 were found and the values as described below:
10/1 Quality Nir 0 0 2 4 702 876 7000 9000 0 1 1 4 687 444 421 9113 => Clear Ratio = 50% 0 2 2 4 1241 1548 2111 1987 => Cloud Ratio = 50% 15/1 Quality Nir 0 0 255 255 854 756 9800 9454 0 1 1 1 945 400 402 422 => Clear Ratio ~= 83% 0 0 0 0 869 975 788 799 => Cloud Ratio ~= 0%
According to Brazil Data Cube User Guide, the best image is 15/1 (clear ratio ~83%) and worst as 10/1 (50%). The result data cube will be:
Landsat-8_30_16D_LCF Quality Nir Provenance (Day of Year) 0 0 2 4 854 756 7000 9000 15 15 10 10 0 1 1 1 945 400 411 422 15 15 15 15 0 0 0 0 869 975 788 799 15 15 15 15 Landsat-8_30_16D_MED Nir 778 816 -9999 -9999 816 422 402 422 1055 975 788 799
Note
When build_clear_observation is set, make sure to do not execute in parallel processing since it is not thread-safe. The provenance band is not generated by MEDIAN products. For pixels nodata in the best image, the cube builder will try to find useful pixel in the next observation. It may be cloud/cloud-shadow (when there is no valid pixel 0 and 1). Otherwise, fill as nodata.
- Parameters:
activity – Prepared blend activity metadata
band_map – Map of data cube bands (common_name : name)
build_clear_observation – Flag to dispatch generation of Clear Observation band. It is not
thread-safe
.
- Returns:
A processed activity with the generated values.
Utils for Image Operation
- cube_builder.utils.processing.compute_data_set_stats(file_path: str, mask: dict, compute: bool = True) Tuple[float, float]
Compute data set efficacy and cloud ratio.
It opens the given
file_path
and calculate the mask statistics, such efficacy and cloud ratio.- Parameters:
set (file_path - Path to given data) –
name (data_set_name - Data set) –
- Returns:
Tuple consisting in efficacy and cloud ratio, respectively.
Define a utility to validate merge images.
- class cube_builder.utils.image.QAConfidence(cloud=None, cloud_shadow=None, cirrus=None, snow=None, oli=None)
Type for Quality Assessment definition for Landsat Collection 2.
These properties will be evaluated using Python Virtual Machine like:
# Define that will discard all cloud values which has confidence greater or equal MEDIUM. qa = QAConfidence(cloud='cloud >= MEDIUM', cloud_shadow=None, cirrus=None, snow=None)
- apply(data)
Apply the Bit confidence to the Quality Assessment mask.
- Parameters:
data (numpy.ma.MaskedArray) – The Masked raster QA Pixel
- Returns:
numpy.ma.MaskedArray - The masked pixels with satisfy the confidence attributes.
- cirrus: str | None
Represent the Cirrus.
- cloud: str | None
Represent the Cloud Confidence.
- cloud_shadow: str | None
Represent the Cloud Shadow Confidence.
- oli: bool | ndarray
Flag to identify OLI/TIRS Satellite.
- snow: str | None
Represent the Snow/Ice.
- class cube_builder.utils.image.SmartDataSet(file_path: str, mode='r', tags=None, **properties)
Defines utility class to auto close rasterio data set.
This class is class helper to avoid memory leak of opened data set in memory.
- close()
Close rasterio data set.
- cube_builder.utils.image.check_file_integrity(file_path: str | Path, read_bytes: bool = False) bool
Check Raster File integrity.
- Parameters:
file_path (str|Path) – Path to the raster file
read_bytes (bool) – Read raster band entire. Default is False
- cube_builder.utils.image.create_empty_raster(location: str, proj4: str, dtype: str, xmin: float, ymax: float, resolution: List[float], dist: List[float], nodata: float, cog=True)
Create an data set filled out with nodata.
This method aims to solve the problem to generate an empty scene to make sure in order to follow the data cube timeline.
- Parameters:
location (str) – Path where file will be generated.
proj4 (str) – Proj4 with Coordinate Reference System.
dtype (str) – Data type
xmin (float) – Image minx (Related to geotransform)
ymax (float) – Image ymax
resolution (List[float]) – Pixel resolution (X, Y)
dist (List[float]) – The distance of X, Y (Scene offset)
nodata (float) – Scene nodata.
cog (bool) – Flag to generate datacube. Default is True.
- cube_builder.utils.image.extract_qa_bits(band_data, bit_location)
Retrieve the bit information from given position.
- Parameters:
band_data (int|numpy.ma.masked_array) –
bit_location (int) –
- cube_builder.utils.image.generate_cogs(input_data_set_path, file_path, profile='deflate', block_size=None, profile_options=None, **options)
Generate Cloud Optimized GeoTIFF files (COG).
- Parameters:
input_data_set_path (str) –
file_path (str) –
profile (str) –
profile_options (dict) –
block_size (int) –
- Returns:
Path to COG.
- cube_builder.utils.image.get_qa_mask(data: MaskedArray, clear_data: List[float] | None = None, not_clear_data: List[float] | None = None, nodata: float | None = None, confidence: QAConfidence | None = None) MaskedArray
Extract Quality Assessment Bits from Landsat Collection 2 Level-2 products.
This method uses the bitwise operation to extract bits according to the document Landsat 8 Collection 2 (C2) Level 2 Science Product (L2SP) Guide, page 13.
Note
This method supports the products Landsat-4 (or more) Collection 2 Science products. Please take a look into
cube_builder.utils.image.QAConfidence
if you are dealing with multiple sensors.Example
>>> import numpy >>> from cube_builder.utils.image import QAConfidence, get_qa_mask
>>> mid_cloud_confidence = QAConfidence(cloud='cloud == MEDIUM', cloud_shadow=None, cirrus=None, snow=None, oli=True) >>> clear = [6, 7] # Clear and Water >>> not_clear = [1, 2, 3, 4] # Dilated Cloud, Cirrus, Cloud, Cloud Shadow >>> get_qa_mask(numpy.ma.array([22080], dtype=numpy.int16, fill_value=1), ... clear_data=clear, not_clear_data=not_clear, ... nodata=1, confidence=mid_cloud_confidence) masked_array(data=[--], mask=[ True], fill_value=1, dtype=int16) >>> # When no cloud confidence set, this value will be Clear since Cloud Pixel is off. >>> get_qa_mask(numpy.ma.array([22080], dtype=numpy.int16, fill_value=1), ... clear_data=clear, not_clear_data=not_clear, ... nodata=1) masked_array(data=[22080], mask=[False], fill_value=1, dtype=int16)
- Parameters:
data (numpy.ma.masked_array) – The QA Raster Data
clear_data (List[float]) – The bits values to be considered as Clear. Default is [].
not_clear_data (List[float]) – The bits values to be considered as Not Clear Values (Cloud,Shadow, etc).
nodata (float) – Pixel nodata value.
confidence (QAConfidence) – The confidence rules mapping. See more in
QAConfidence
.
- Returns:
An array which the values represent
clear_data
and the masked values representsnot_clear_data
.- Return type:
numpy.ma.masked_array
- cube_builder.utils.image.get_resample_method(name: str) Resampling
Retrieve a resampling method from name.
Note
This method uses
rasterio.warp.Resampling
.- Parameters:
name – The resampling name
- cube_builder.utils.image.linear_raster_scale(array: ndarray | MaskedArray, input_range: Tuple[int, int], output_range: Tuple[int, int] = (0, 255)) ndarray | MaskedArray
Clip the values in an array and apply linear rescaling.
Note
This function is compatible with numpy.ma module
- Parameters:
array (ArrayType) – Input raster
input_range – The array min and max values
output_range – The output min and max values to rescale to. Defaults to
0, 255
.
- Returns:
scaled array (in float)
- Return type:
ArrayType
- cube_builder.utils.image.match_histogram_with_merges(source: str, source_mask: str, reference: str, reference_mask: str, **kwargs)
Normalize the source image histogram with reference image.
This functions implements the skimage.exposure.match_histograms, which consists in the manipulate the pixels of an input image and match the histogram with the reference image.
See more in Histogram Matching.
Note
It overwrites the source file.
- Parameters:
source (str) – Path to the rasterio data set file
source_mask (str) – Path to the rasterio data set file
reference (str) – Path to the rasterio data set file
reference_mask (str) – Path to the rasterio data set file
- cube_builder.utils.image.radsat_extract_bits(bit_value: int | ndarray, bit_start: int, bit_end: int | None = None)
Extract bitwise values from image.
This method uses the bitwise operation to identify pixel saturation. According to the document LaSRC Product Guide, the Landsat Radiometric Saturation Quality Assessment Band (radsat_qa) is a bit packed representation of which sensor bands were saturated during data sensing capture. The value 1 represents saturated value while 0 is valid data. For Landsat-8, the following table represents pixels saturation:
Bit Bit Value Description 0 1 Data Fill Flag 1 2 Band 1 Data Saturation Flag 2 4 Band 2 Data Saturation Flag 3 8 Band 3 Data Saturation Flag 4 16 Band 4 Data Saturation Flag 5 32 Band 5 Data Saturation Flag 6 64 Band 6 Data Saturation Flag 7 128 Band 7 Data Saturation Flag 8 256 Band 8 Data Saturation Flag 9 512 Band 9 Data Saturation Flag 10 1024 Band 10 Data Saturation Flag 11 2048 Band 11 Data Saturation Flag
Example
>>> from cube_builder.utils.image import radsat_extract_bits >>> # Represents band 10 (1024) and band 1 (2) is saturated. >>> # Check if any band is saturated >>> radsat_extract_bits(1026, 1, 7) 1 >>> # You can also pass the numpy array >>> # radsat_extract_bits(numpy.random.randint(0, 1028, size=(100, 100)), 1, 7)
- cube_builder.utils.image.raster_convexhull(imagepath: str, epsg='EPSG:4326') dict
Get a raster image footprint.
- Parameters:
imagepath (str) – image file
epsg (str) – geometry EPSG
- cube_builder.utils.image.raster_extent(imagepath: str, epsg='EPSG:4326') Polygon
Get raster extent in arbitrary CRS.
- Parameters:
imagepath (str) – Path to image
epsg (str) – EPSG Code of result crs
- Returns:
geojson-like geometry
- Return type:
dict
- cube_builder.utils.image.rescale(array: ndarray | MaskedArray, multiplier: float, new_scale: float, origin_additive: float = 0, dtype=None)
Rescale an array into new range.
To prevent any data loss or invalid data while casting, both maximum and minimum values of
dtype
will be set when overflow limits.Tip
When dealing with negative
origin_additive
factor or values which may be negative, make sure to use right numpy dtype and Numpy Masked Arrays to masknodata
values to avoid value limit coercion.Note
When the result value overflow the data type, the value is coerced to the data type limits. See more in numpy.iinfo.
- Parameters:
array – Input array
multiplier – Origin array scale multiplier
new_scale – Target scale factor.
origin_additive – Origin additive factor
dtype – New data type for casting. Default is original array.
Examples
This example covers the rescaling Landsat Collection 2 arrays (1-65535, scale=0.0000275 - 0.2) into 0-10000 values.
>>> import numpy >>> from cube_builder.utils.image import rescale >>> arr3d = numpy.random.randint(1, 65535, (3, 3), dtype=numpy.uint16) >>> rescale(arr3d, 0.0000275, new_scale=0.0001, origin_additive=-0.2) array([[15065.675, 6057.5 , 2893.075], [ 3523.375, 14349.3 , 9762.3 ], [14221.425, -1438.725, -363.75 ]])
- cube_builder.utils.image.save_as_cog(destination: str, raster, mode='w', tags=None, block_size=None, **profile)
Save the raster file as Cloud Optimized GeoTIFF.
See also
Cloud Optimized GeoTiff https://gdal.org/drivers/raster/cog.html
- Parameters:
destination – Path to store the data set.
raster – Numpy raster values to persist in disk
mode – Default rasterio mode. Default is ‘w’ but you also can set ‘r+’.
tags – Tag values (Dict[str, str]) to write on dataset.
**profile – Rasterio profile values to add in dataset.
- cube_builder.utils.image.validate(row: Row)
Validate each merge result.
- cube_builder.utils.image.validate_merges(images: LegacyCursorResult, num_threads: int = 2) dict
Validate each merge retrieved from
Activity.list_merge_files
.- Parameters:
images – Activity merge images
num_threads – Concurrent processes to validate
Band Index Generator
Simple data cube band generator.
- cube_builder.utils.index_generator.BandMapFile
Type which a key (represented as data cube band name) points to generated file in disk.
alias of
Dict
[str
,str
]
- cube_builder.utils.index_generator.generate_band_indexes(cube: Collection, scenes: dict, period: str, tile_id: str, reuse_data_cube: Collection | None = None, **kwargs) Dict[str, str]
Generate data cube custom bands based in string-expression on table band_indexes.
This method seeks for custom bands on Collection Band definition. A custom band must have metadata property filled out according the
bdc_catalog.jsonschemas.band-metadata.json
.Note
When collection does not have any index band, returns empty dict.
- Raises:
RuntimeError – when an error occurs while interpreting the band expression in Python Virtual Machine.
- Returns:
A dict values with generated bands.
Tasks
Define a structure component to run celery worker.
- cube_builder.celery.worker.load_models(*args, **kwargs)
Load celery models when worker is ready.
Define celery tasks for Cube Builder.
The processing workflow consists in:
Search -> Merge -> prepare_blend -> blend -> publish
- cube_builder.celery.tasks.warp_merge(activity, band_map, mask, force=False, data_dir=None, **kwargs)
Execute datacube merge task.
This task consists in the following steps:
Prepare a raster using dimensions of datacube GRS schema.
Open collection dataset with RasterIO and reproject to datacube GRS Schema.
Fill the respective path row into raster
- Parameters:
Model (activity - Datacube Activity) –
cache. (force - Flag to build data cube without) –
- Returns:
Validated activity
- cube_builder.celery.tasks.prepare_blend(merges, band_map: dict, reuse_data_cube=None, **kwargs)
Receive merges by period and prepare task blend.
This task aims to prepare celery task definition for blend. A blend requires both data set quality band and others bands. In this way, we must group these values by temporal resolution and then schedule blend tasks.
- cube_builder.celery.tasks.blend(activity, band_map, build_clear_observation=False, reuse_data_cube=None, **kwargs)
Execute datacube blend task.
- Parameters:
Model. (activity - Datacube Activity) –
name. (band_map - Band mapping with common_name and band original) –
Observation". (build_clear_observation - Generate band "Clear) –
- Returns:
Validated activity
- cube_builder.celery.tasks.publish(blends, band_map, quality_band: str, reuse_data_cube=None, **kwargs)
Execute publish task and catalog datacube result.
- Parameters:
Model (activity - Datacube Activity) –
- cube_builder.celery.tasks.capture_traceback(exception=None)
Retrieve stacktrace as string.
- cube_builder.celery.tasks.create_execution(activity: dict) Activity
Create cube-builder activity and prepare celery execution.
- Parameters:
dict (activity - Cube Builder Activity) –
- Returns:
Activity the cube build activity model