Add weight threshold option for temporal operations by tomvothecoder · Pull Request #683 · xCDAT/xcdat

tomvothecoder · 2024-07-30T22:18:55Z

Description

This PR adds a min_weight threshold parameter to temporal group averaging APIs, allowing users to set a minimum data coverage requirement for temporal averaging. This enhancement ensures temporal averages are only computed when sufficient data is available, with values below the threshold being set to NaN.

Adds min_weight parameter to group_average, climatology, and departures methods
Implements weight threshold validation and masking in weighted temporal averaging
Refactors weighted averaging logic into a dedicated _weighted_group_average method

More Context

Related to [Enhancement]: Add weight threshold option for averaging operations #531
Related to Add weight threshold option for spatial averaging #672

Checklist

My code follows the style guidelines of this project
I have performed a self-review of my own code
My changes generate no new warnings
Any dependent changes have been merged and published in downstream modules

If applicable:

I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass with my changes (locally and CI/CD build)
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have noted that this is a breaking change for a major release (fix or feature that would cause existing functionality to not work as expected)

codecov · 2024-07-30T22:38:24Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (683e973) to head (f21b2b6).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff            @@
##              main      #683   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           16        16           
  Lines         1768      1781   +13     
=========================================
+ Hits          1768      1781   +13

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

tomvothecoder · 2024-08-27T22:50:14Z

xcdat/utils.py

+    This function is useful for cases where the weighting of data might be
+    skewed based on the availability of data. For example, if one season in a
+    time series has more significantly more missing data than other seasons, it
+    can result in inaccurate calculations of climatologies. Masking values that
+    do not meet the minimum weight threshold ensures more accurate calculations.


Is this an accurate description?

xcdat/utils.py

xcdat/temporal.py

tomvothecoder · 2025-03-26T21:12:49Z

To test this PR against real-world data, we can:

Create a script that loops over model data with different weight threshold options
Generate expected: CDAT temporal averaging with criteriaarg specified
Generate result: Runs xCDAT temporal averaging with min_weight specified
Compare results against expected for closeness and aligned np.nan values

lee1043 · 2025-04-23T18:35:24Z

climatology and departures seems not having min_weight parameter added. Can they have it as like group_average for consistency?

tomvothecoder · 2025-04-23T22:34:27Z

climatology and departures seems not having min_weight parameter added. Can they have it as like group_average for consistency?

I am converting this PR back to draft and will tag you once it is ready.

tomvothecoder · 2025-07-17T17:22:32Z

@pochedls I actually don't think this is dependent on #735. I will pick this one back up first since it is easier I think.

pochedls · 2025-07-17T18:30:23Z

@pochedls I actually don't think this is dependent on #735. I will pick this one back up first since it is easier I think.

I can't find where I linked these together, but I agree this is a related-but-separate issue.

tomvothecoder · 2025-08-14T21:14:57Z

It looks like "> min_weight" being used, wondering rather ">= min_weight" could be used. For example, when min_weight=1 (consider grid with no missing time step), I get an empty field (see below, rightmost figure).

Thank you for the review so far @lee1043.

@pochedls can you or someone else grant me read permission to /globa/cfs/projectdirs/m4581? I'm getting PermissionError: [Errno 13] Permission denied: '/global/cfs/projectdirs/m4581/obs4MIPs/obs4MIPs_LLNL/MOHC/HadISST-1-1/mon/ts/gn/v20250415/ts_mon_HadISST-1-1_PCMDI_gn_187001-202501.nc' with Jiwoo's example script.

lee1043 · 2025-08-14T21:55:50Z

@tomvothecoder I put the input data here in NERSC, can you check if you can access to it?

/pscratch/sd/l/lee1043/temporary/ts_mon_HadISST-1-1_PCMDI_gn_187001-202501.nc

tomvothecoder · 2025-08-14T23:34:48Z

@tomvothecoder I put the input data here in NERSC, can you check if you can access to it?

/pscratch/sd/l/lee1043/temporary/ts_mon_HadISST-1-1_PCMDI_gn_187001-202501.nc

This works, thanks Jiwoo.

tomvothecoder · 2025-08-20T00:02:57Z

xcdat/temporal.py

-            # Set averaged data to NaN where masked weights are below the
-            # minimum threshold.
+            # Mask averaged data where the fraction of weights in each group
+            # does not meet the minimum weight threshold (fractional).
            if self._min_weight > 0.0:
+                # The sum of all weights in each group (i.e., full coverage)
+                weight_sum_all = self._group_data(self._weights).sum(skipna=skipna)
+
+                # Fraction of weights present in each group.
+                weight_fraction = masked_weights_group_sum / weight_sum_all
+
+                # Mask the averaged data where the weight fraction is below
+                # the minimum weight threshold.
                dv_avg = xr.where(
-                    masked_weights_group_sum >= self._min_weight,
+                    weight_fraction >= self._min_weight,


Hey @lee1043 and @pochedls, I identified and fixed a bug after reviewing your comments, as shown in the code diff here.

Before

This compared against the absolute sum of valid weights, which could behave inconsistently across groups with different total weights.

# Set averaged data to NaN where masked weights are below the # minimum threshold. if self._min_weight > 0.0: dv_avg = xr.where( masked_weights_group_sum >= self._min_weight, ... )

After

Now the check is fractional coverage (masked / total), ensuring consistent thresholding across groups of different sizes.

# Mask averaged data where the fraction of weights in each group # does not meet the minimum weight threshold (fractional). if self._min_weight > 0.0: # The sum of all weights in each group (i.e., full coverage) weight_sum_all = self._group_data(self._weights).sum(skipna=skipna) # Fraction of weights present in each group weight_fraction = masked_weights_group_sum / weight_sum_all # Mask the averaged data where the weight fraction is below # the minimum weight threshold dv_avg = xr.where( weight_fraction >= self._min_weight, ... )

This logic matches the spatial average weight threshold function here:

xcdat/xcdat/spatial.py

Lines 784 to 844 in 15c5ee7

def _mask_var_with_weight_threshold(

self,

dv: xr.DataArray,

dv_mean: xr.DataArray,

dim: list[str],

weights: xr.DataArray,

min_weight: float,

) -> xr.DataArray:

"""Mask values that do not meet the minimum weight threshold with np.nan.

This function is useful for cases where the weighting of data might be

skewed based on the availability of data. For example, if a portion of

cells in a region has significantly more missing data than other other

regions, it can result in inaccurate spatial average calculations.

Masking values that do not meet the minimum weight threshold ensures

more accurate calculations.

Parameters

----------

dv : xr.DataArray

The weighted variable used for getting masked weights.

dv_mean : xr.DataArray

The average of the weighted variable.

dim: list[str]:

List of axis dimensions to average over.

weights : xr.DataArray

A DataArray containing either the regional weights used for weighted

averaging. ``weights`` must include the same axis dimensions and

dimensional sizes as the data variable.

min_weight : float, optional

Minimum threshold of data coverage (weight) required to compute

a spatial average for a grouping window. Must be between 0 and 1.

Useful for ensuring accurate averages in regions with missing data,

by default None (equivalent to 0.0).

The value must be between 0 and 1, where:

- 0/``None`` means no minimum threshold (all data is considered,

even if coverage is minimal).

- 1 means full data coverage is required (no missing data is

allowed).

Returns

-------

xr.DataArray

The average of the weighted variable with the minimum weight

threshold applied.

"""

# Sum all weights, including zero for missing values.

weight_sum_all = weights.sum(dim=dim)

masked_weights = _get_masked_weights(dv, weights)

weight_sum_masked = masked_weights.sum(dim=dim)

# Get fraction of the available weight.

frac = weight_sum_masked / weight_sum_all

# Nan out values that don't meet specified weight threshold.

dv_new = xr.where(frac >= min_weight, dv_mean, np.nan, keep_attrs=True)

dv_new.name = dv_mean.name

return dv_new

With this change, @lee1043’s script (here) now produces plots for all min_weight settings (shown below):

Next Steps

@lee1043 and @pochedls — can you review again to confirm this new logic is correct?

Side Note

My unit tests didn’t catch this bug and still pass with the new logic. This highlights the importance of validating with real data alongside unit tests, as we routinely do.

@tomvothecoder I was able to get the same plot using the latest version of this branch!

lee1043

Looks good to me! Thanks for the PR, @tomvothecoder!

xcdat/temporal.py

lee1043 · 2025-08-20T15:16:55Z

xcdat/temporal.py

-            # Set averaged data to NaN where masked weights are below the
-            # minimum threshold.
+            # Mask averaged data where the fraction of weights in each group
+            # does not meet the minimum weight threshold (fractional).
            if self._min_weight > 0.0:
+                # The sum of all weights in each group (i.e., full coverage)
+                weight_sum_all = self._group_data(self._weights).sum(skipna=skipna)
+
+                # Fraction of weights present in each group.
+                weight_fraction = masked_weights_group_sum / weight_sum_all
+
+                # Mask the averaged data where the weight fraction is below
+                # the minimum weight threshold.
                dv_avg = xr.where(
-                    masked_weights_group_sum >= self._min_weight,
+                    weight_fraction >= self._min_weight,


@tomvothecoder I was able to get the same plot using the latest version of this branch!

Copilot

Pull Request Overview

This PR adds a min_weight threshold parameter to temporal group averaging APIs, allowing users to set a minimum data coverage requirement for temporal averaging operations. This ensures temporal averages are only computed when sufficient data is available.

Adds min_weight parameter to group_average, climatology, and departures methods
Implements weight threshold validation and masking in weighted temporal averaging
Refactors weighted averaging logic into a dedicated _weighted_group_average method

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
xcdat/temporal.py	Adds min_weight parameter to temporal averaging methods and implements weighted averaging logic with threshold validation
tests/test_temporal.py	Updates existing tests to include expected min_weight attributes and adds comprehensive test coverage for min_weight threshold functionality

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

xcdat/temporal.py

Add `_get_masked_weights()` Update the order of methods Convert temporal weight threshold methods to general functions - Add tests for threshold functions Add tests for utils Apply suggestions from code review Apply suggestions from code review Fix masking of data in `_group_average()` - This method now masks data using the weights grouped properly, instead of using the `weight_var_with_weight_threshold()` function - Add `from __future__ import annotations` to `spatial.py` - Add and update units Fix docstring Remove unused `_mask_var_with_weight_threshold()` function Fix TypeError due to annotaitons

- Add tests

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

tomvothecoder added the type: enhancement New enhancement request label Jul 30, 2024

tomvothecoder added this to the FY24Q4 (07/01/24 - 09/30/24) milestone Jul 30, 2024

tomvothecoder self-assigned this Jul 30, 2024

tomvothecoder mentioned this pull request Jul 30, 2024

Add weight threshold option for spatial averaging #672

Merged

9 tasks

tomvothecoder commented Aug 27, 2024

View reviewed changes

xcdat/utils.py Outdated Show resolved Hide resolved

tomvothecoder commented Aug 27, 2024

View reviewed changes

xcdat/temporal.py Outdated Show resolved Hide resolved

tomvothecoder force-pushed the feature/531-temporal-apis branch from 7e9635f to bd872cf Compare September 4, 2024 19:37

tomvothecoder marked this pull request as ready for review September 4, 2024 19:45

tomvothecoder modified the milestones: FY24Q4 (07/01/24 - 09/30/24), FY25Q1 (10/01/24 - 12/31/24) Oct 7, 2024

tomvothecoder added the project: seats-year3 A SEATS goals for year 3. label Oct 10, 2024

tomvothecoder modified the milestones: FY25Q1 (10/01/24 - 12/31/24), FY25 Q2 (01/01/25 - 03/31/25) Jan 17, 2025

tomvothecoder modified the milestones: FY25Q2 (01/01/25 - 03/31/25), FY25Q3 (04/01/25 - 06/30/25) Apr 14, 2025

tomvothecoder marked this pull request as draft April 23, 2025 22:35

tomvothecoder modified the milestones: FY25Q3 (04/01/25 - 06/30/25), FY24Q4 (07/01/25 - 09/30/25) Jun 30, 2025

tomvothecoder added project: seats-year5 A SEATS goal for year 5. and removed project: seats-year3 A SEATS goals for year 3. labels Jun 30, 2025

tomvothecoder force-pushed the feature/531-temporal-apis branch 2 times, most recently from 9de2d94 to 059ce37 Compare July 17, 2025 18:16

tomvothecoder force-pushed the feature/531-temporal-apis branch from 059ce37 to 5ef344f Compare July 17, 2025 20:29

tomvothecoder force-pushed the feature/531-temporal-apis branch from 3c9e799 to 2cb27ce Compare August 14, 2025 20:41

tomvothecoder force-pushed the feature/531-temporal-apis branch 2 times, most recently from b3ed292 to d353df8 Compare August 19, 2025 23:50

tomvothecoder commented Aug 20, 2025

View reviewed changes

tomvothecoder requested review from Copilot and pochedls August 20, 2025 00:03

This comment was marked as outdated.

Sign in to view

lee1043 approved these changes Aug 20, 2025

View reviewed changes

tomvothecoder requested a review from Copilot August 21, 2025 16:13

Copilot AI reviewed Aug 21, 2025

View reviewed changes

xcdat/temporal.py Outdated Show resolved Hide resolved

tomvothecoder and others added 14 commits August 21, 2025 10:54

Fix tests

1e3cef9

Update tests

261e203

Revert rebase changes

cde8740

Fix test

cf6aa70

Add min_weight to climatology and departures

7a69a89

- Add tests

Add PR review suggestions

3e1f301

Remove unnecessary future import

45fff2e

Update test comments

f783685

Fix masking to use weight fraction instead of abs sum

23f927f

Get masked weights using original variable not weighted

6b3aa8e

Add min_weight attr to final averaged dataarray .attrs

d59ce11

Update order of min_weight attr in tests

63227a9

Update xcdat/temporal.py

f21b2b6

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

tomvothecoder force-pushed the feature/531-temporal-apis branch from 39975bd to f21b2b6 Compare August 21, 2025 17:54

tomvothecoder merged commit 58dac72 into main Aug 21, 2025
10 checks passed

github-project-automation bot moved this from In Review to Done in xCDAT Development Aug 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add weight threshold option for temporal operations#683

Add weight threshold option for temporal operations#683
tomvothecoder merged 14 commits intomainfrom
feature/531-temporal-apis

tomvothecoder commented Jul 30, 2024 •

edited

Loading

codecov bot commented Jul 30, 2024 •

edited

Loading

tomvothecoder Aug 27, 2024

Uh oh!

Uh oh!

tomvothecoder commented Mar 26, 2025

lee1043 commented Apr 23, 2025

tomvothecoder commented Apr 23, 2025

tomvothecoder commented Jul 17, 2025

pochedls commented Jul 17, 2025

tomvothecoder commented Aug 14, 2025 •

edited

Loading

lee1043 commented Aug 14, 2025

tomvothecoder commented Aug 14, 2025

tomvothecoder Aug 20, 2025

lee1043 Aug 20, 2025

This comment was marked as outdated.

Uh oh!

lee1043 left a comment

Uh oh!

Uh oh!

lee1043 Aug 20, 2025

Copilot AI left a comment

Uh oh!

Uh oh!

Labels

4 participants

	def _mask_var_with_weight_threshold(
	self,
	dv: xr.DataArray,
	dv_mean: xr.DataArray,
	dim: list[str],
	weights: xr.DataArray,
	min_weight: float,
	) -> xr.DataArray:
	"""Mask values that do not meet the minimum weight threshold with np.nan.

	This function is useful for cases where the weighting of data might be
	skewed based on the availability of data. For example, if a portion of
	cells in a region has significantly more missing data than other other
	regions, it can result in inaccurate spatial average calculations.
	Masking values that do not meet the minimum weight threshold ensures
	more accurate calculations.

	Parameters
	----------
	dv : xr.DataArray
	The weighted variable used for getting masked weights.
	dv_mean : xr.DataArray
	The average of the weighted variable.
	dim: list[str]:
	List of axis dimensions to average over.
	weights : xr.DataArray
	A DataArray containing either the regional weights used for weighted
	averaging. ``weights`` must include the same axis dimensions and
	dimensional sizes as the data variable.
	min_weight : float, optional
	Minimum threshold of data coverage (weight) required to compute
	a spatial average for a grouping window. Must be between 0 and 1.
	Useful for ensuring accurate averages in regions with missing data,
	by default None (equivalent to 0.0).

	The value must be between 0 and 1, where:
	- 0/``None`` means no minimum threshold (all data is considered,
	even if coverage is minimal).
	- 1 means full data coverage is required (no missing data is
	allowed).

	Returns
	-------
	xr.DataArray
	The average of the weighted variable with the minimum weight
	threshold applied.
	"""
	# Sum all weights, including zero for missing values.
	weight_sum_all = weights.sum(dim=dim)

	masked_weights = _get_masked_weights(dv, weights)
	weight_sum_masked = masked_weights.sum(dim=dim)

	# Get fraction of the available weight.
	frac = weight_sum_masked / weight_sum_all

	# Nan out values that don't meet specified weight threshold.
	dv_new = xr.where(frac >= min_weight, dv_mean, np.nan, keep_attrs=True)
	dv_new.name = dv_mean.name

	return dv_new

Conversation

tomvothecoder commented Jul 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

More Context

Checklist

codecov bot commented Jul 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

tomvothecoder Aug 27, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tomvothecoder commented Mar 26, 2025

lee1043 commented Apr 23, 2025

tomvothecoder commented Apr 23, 2025

tomvothecoder commented Jul 17, 2025

pochedls commented Jul 17, 2025

tomvothecoder commented Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

lee1043 commented Aug 14, 2025

tomvothecoder commented Aug 14, 2025

tomvothecoder Aug 20, 2025

Choose a reason for hiding this comment

Before

After

Next Steps

Side Note

lee1043 Aug 20, 2025

Choose a reason for hiding this comment

This comment was marked as outdated.

Uh oh!

lee1043 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lee1043 Aug 20, 2025

Choose a reason for hiding this comment

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Labels

4 participants

tomvothecoder commented Jul 30, 2024 •

edited

Loading

codecov bot commented Jul 30, 2024 •

edited

Loading

tomvothecoder commented Aug 14, 2025 •

edited

Loading