Add weight threshold option for temporal operations#683
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #683 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 16 16
Lines 1768 1781 +13
=========================================
+ Hits 1768 1781 +13 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
xcdat/utils.py
Outdated
| This function is useful for cases where the weighting of data might be | ||
| skewed based on the availability of data. For example, if one season in a | ||
| time series has more significantly more missing data than other seasons, it | ||
| can result in inaccurate calculations of climatologies. Masking values that | ||
| do not meet the minimum weight threshold ensures more accurate calculations. |
There was a problem hiding this comment.
Is this an accurate description?
7e9635f to
bd872cf
Compare
|
To test this PR against real-world data, we can:
|
|
|
I am converting this PR back to draft and will tag you once it is ready. |
9de2d94 to
059ce37
Compare
059ce37 to
5ef344f
Compare
3c9e799 to
2cb27ce
Compare
Thank you for the review so far @lee1043. @pochedls can you or someone else grant me read permission to |
|
@tomvothecoder I put the input data here in NERSC, can you check if you can access to it? /pscratch/sd/l/lee1043/temporary/ts_mon_HadISST-1-1_PCMDI_gn_187001-202501.nc |
This works, thanks Jiwoo. |
b3ed292 to
d353df8
Compare
xcdat/temporal.py
Outdated
| # Set averaged data to NaN where masked weights are below the | ||
| # minimum threshold. | ||
| # Mask averaged data where the fraction of weights in each group | ||
| # does not meet the minimum weight threshold (fractional). | ||
| if self._min_weight > 0.0: | ||
| # The sum of all weights in each group (i.e., full coverage) | ||
| weight_sum_all = self._group_data(self._weights).sum(skipna=skipna) | ||
|
|
||
| # Fraction of weights present in each group. | ||
| weight_fraction = masked_weights_group_sum / weight_sum_all | ||
|
|
||
| # Mask the averaged data where the weight fraction is below | ||
| # the minimum weight threshold. | ||
| dv_avg = xr.where( | ||
| masked_weights_group_sum >= self._min_weight, | ||
| weight_fraction >= self._min_weight, |
There was a problem hiding this comment.
Hey @lee1043 and @pochedls, I identified and fixed a bug after reviewing your comments, as shown in the code diff here.
Before
This compared against the absolute sum of valid weights, which could behave inconsistently across groups with different total weights.
# Set averaged data to NaN where masked weights are below the
# minimum threshold.
if self._min_weight > 0.0:
dv_avg = xr.where(
masked_weights_group_sum >= self._min_weight,
...
)After
Now the check is fractional coverage (masked / total), ensuring consistent thresholding across groups of different sizes.
# Mask averaged data where the fraction of weights in each group
# does not meet the minimum weight threshold (fractional).
if self._min_weight > 0.0:
# The sum of all weights in each group (i.e., full coverage)
weight_sum_all = self._group_data(self._weights).sum(skipna=skipna)
# Fraction of weights present in each group
weight_fraction = masked_weights_group_sum / weight_sum_all
# Mask the averaged data where the weight fraction is below
# the minimum weight threshold
dv_avg = xr.where(
weight_fraction >= self._min_weight,
...
)This logic matches the spatial average weight threshold function here:
Lines 784 to 844 in 15c5ee7
With this change, @lee1043’s script (here) now produces plots for all min_weight settings (shown below):
Next Steps
@lee1043 and @pochedls — can you review again to confirm this new logic is correct?
Side Note
My unit tests didn’t catch this bug and still pass with the new logic. This highlights the importance of validating with real data alongside unit tests, as we routinely do.
There was a problem hiding this comment.
@tomvothecoder I was able to get the same plot using the latest version of this branch!
lee1043
left a comment
There was a problem hiding this comment.
Looks good to me! Thanks for the PR, @tomvothecoder!
xcdat/temporal.py
Outdated
| # Set averaged data to NaN where masked weights are below the | ||
| # minimum threshold. | ||
| # Mask averaged data where the fraction of weights in each group | ||
| # does not meet the minimum weight threshold (fractional). | ||
| if self._min_weight > 0.0: | ||
| # The sum of all weights in each group (i.e., full coverage) | ||
| weight_sum_all = self._group_data(self._weights).sum(skipna=skipna) | ||
|
|
||
| # Fraction of weights present in each group. | ||
| weight_fraction = masked_weights_group_sum / weight_sum_all | ||
|
|
||
| # Mask the averaged data where the weight fraction is below | ||
| # the minimum weight threshold. | ||
| dv_avg = xr.where( | ||
| masked_weights_group_sum >= self._min_weight, | ||
| weight_fraction >= self._min_weight, |
There was a problem hiding this comment.
@tomvothecoder I was able to get the same plot using the latest version of this branch!
There was a problem hiding this comment.
Pull Request Overview
This PR adds a min_weight threshold parameter to temporal group averaging APIs, allowing users to set a minimum data coverage requirement for temporal averaging operations. This ensures temporal averages are only computed when sufficient data is available.
- Adds
min_weightparameter togroup_average,climatology, anddeparturesmethods - Implements weight threshold validation and masking in weighted temporal averaging
- Refactors weighted averaging logic into a dedicated
_weighted_group_averagemethod
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| xcdat/temporal.py | Adds min_weight parameter to temporal averaging methods and implements weighted averaging logic with threshold validation |
| tests/test_temporal.py | Updates existing tests to include expected min_weight attributes and adds comprehensive test coverage for min_weight threshold functionality |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
Add `_get_masked_weights()` Update the order of methods Convert temporal weight threshold methods to general functions - Add tests for threshold functions Add tests for utils Apply suggestions from code review Apply suggestions from code review Fix masking of data in `_group_average()` - This method now masks data using the weights grouped properly, instead of using the `weight_var_with_weight_threshold()` function - Add `from __future__ import annotations` to `spatial.py` - Add and update units Fix docstring Remove unused `_mask_var_with_weight_threshold()` function Fix TypeError due to annotaitons
- Add tests
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
39975bd to
f21b2b6
Compare
Description
This PR adds a
min_weightthreshold parameter to temporal group averaging APIs, allowing users to set a minimum data coverage requirement for temporal averaging. This enhancement ensures temporal averages are only computed when sufficient data is available, with values below the threshold being set to NaN.min_weightparameter togroup_average,climatology, anddeparturesmethods_weighted_group_averagemethodMore Context
Checklist
If applicable: