aggregate
hierarchy
Runner for the collate stage of the climate aggregates pipeline.
This module provides functions to collate block-level datasets into measure-level datasets. The compilation process: 1. Loads raw results from all blocks for a given hierarchy, measure, and scenario 2. Aggregates the data up the location hierarchy 3. Produces views for subset hierarchies 4. Saves the results as measure-level datasets
hierarchy(agg_version: str, hierarchy: list[str], agg_measure: list[str], agg_scenario: list[str], population_model_dir: str, output_dir: str, queue: str) -> None
Collate block-level datasets into measure-level datasets.
This command: 1. Identifies which combinations of hierarchy, measure, and scenario need compilation 2. Creates parallel jobs to collate each combination 3. Runs the jobs using jobmon
Source code in src/climate_data/aggregate/hierarchy.py
hierarchy_main(agg_version: str, hierarchy: str, measure: str, scenario: str, population_model_dir: str, output_dir: str, *, progress_bar: bool = False) -> None
Collate block-level datasets into measure/scenario-level datasets.
This function: 1. Loads all block-level datasets for a given hierarchy, measure, and scenario 2. Combines them into a single dataset 3. Aggregates the data up the location hierarchy 4. Produces views for subset hierarchies 5. Saves the results as measure-level datasets
Parameters
agg_version The version identifier hierarchy The full aggregation hierarchy to process measure The climate measure to process scenario The climate scenario to process population_model_dir Path to the population model directory output_dir Path to save results progress_bar Whether to show a progress bar
Source code in src/climate_data/aggregate/hierarchy.py
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 | |
hierarchy_task(agg_version: str, hierarchy: str, agg_measure: str, agg_scenario: str, population_model_dir: str, output_dir: str, *, progress_bar: bool) -> None
Collate block-level datasets into measure-level datasets.
This command collates results for a specific hierarchy, measure, and scenario.
Source code in src/climate_data/aggregate/hierarchy.py
utils
aggregate_climate_to_hierarchy(data: pd.DataFrame, hierarchy: pd.DataFrame) -> pd.DataFrame
Create all aggregate climate values for a given hierarchy from most-detailed data.
Parameters
data The most-detailed climate data to aggregate. hierarchy The hierarchy to aggregate the data to.
Returns
pd.DataFrame The climate data with values for all levels of the hierarchy.
Source code in src/climate_data/aggregate/utils.py
build_bounds_map(raster_template: rt.RasterArray, shape_values: list[tuple[Polygon | MultiPolygon, int]]) -> dict[int, tuple[slice, slice]]
Build a map of location IDs to buffered slices of the raster template.
Parameters
raster_template The raster template to build the bounds map for. shape_values A list of tuples where the first element is a shapely Polygon or MultiPolygon in the CRS of the raster template and the second element is the location ID of the shape.
Returns
dict[int, tuple[slice, slice]] A dictionary mapping location IDs to a tuple of slices representing the bounds of the location in the raster template. The slices are buffered by 10 pixels to ensure that the entire shape is included in the mask.
Source code in src/climate_data/aggregate/utils.py
build_location_masks(hierarchy: str, block_key: str, pm_data: PopulationModelData) -> tuple[dict[str, slice], dict[int, tuple[slice, slice, npt.NDArray[np.bool_]]]]
Build location masks for each location in the hierarchy.
Parameters
hierarchy The name of the hierarchy to build location masks for. Must be one of of the keys of the HIERARCHY_MAP constant. pm_data PopulationModelData object to load the population model data.
Returns
tuple[dict[int, tuple[slice, slice]], npt.NDArray[np.uint32]] The first element is a dictionary mapping location IDs to a tuple of slices representing the bounds of the location in the mask. This is useful for subseting the mask and data arrays before processing as downstream operations scale with the number of pixels in the mask. The second element is the mask itself, a 2D array of uint32 values where each location ID is represented by a unique integer value.
Source code in src/climate_data/aggregate/utils.py
get_bbox(raster: rt.RasterArray, crs: str | None = None) -> shapely.Polygon
Get the bounding box of a raster array.
Parameters
raster The raster array to get the bounding box of. crs The CRS to return the bounding box in. If None, the bounding box is returned in the CRS of the raster.
Returns
shapely.Polybon The bounding box of the raster in the CRS specified by the crs parameter.