Module langbrainscore.utils.xarray

Expand source code
import xarray as xr
from sklearn.impute import SimpleImputer

def copy_metadata(target: xr.DataArray, source: xr.DataArray, dim: str) -> xr.DataArray:
    """copies the metadata coordinates of a source xarray on dimension `dim` over to target xarray
        for this to work, the two `xr.DataArray` objects must have the same dimensions and
        dimensionality of data, minimally in the `dim` dimension.

        target (xr.DataArray): target xarray to copy the metadata coordinates onto
            (a copy is made, this does not happen inplace)
        source (xr.DataArray): the source xarray for the metadata coordinates along `dim`
        dim (str): dimension of the metadata coordinates (see `xr` documentation for help)

    for coord in source[dim].coords:
        target = target.assign_coords({coord: (dim, source[coord].data)})
    return target

def collapse_multidim_coord(xr_obj, coord, keep_dim):
    """As a result of iterative construction of `xarray`s in our various functions
        (such as in the `HuggingFaceEncoder.encode` method), the same values are repeated
        over and over again

        xr_obj (xr.Array): _description_
        coord (str): _description_
        keep_dim (bool): _description_

        _type_: _description_
    imputer = SimpleImputer(strategy="most_frequent")
        stimuli = imputer.fit_transform(xr_obj[coord])[0]
        return xr_obj.assign_coords({coord: (keep_dim, stimuli)})
    except ValueError as e:  # TODO which exception? what scenario does this cover?
        stimuli = imputer.fit_transform(xr_obj[coord]).transpose()[0]
        return xr_obj.assign_coords({coord: (keep_dim, stimuli)})

def fix_xr_dtypes(xr_obj):
    sometimes xarrays end up having dtype='O' (object) instead of the
    expected dtypes, which is usually 'str'
    for c in xr_obj.coords:
        if xr_obj[c].dtype == "O":
            xr_obj[c] = xr_obj[c].astype(str)
    return xr_obj  # this is likely not necessary --- the xr_obj should be modified in-place


def collapse_multidim_coord(xr_obj, coord, keep_dim)

As a result of iterative construction of xarrays in our various functions (such as in the HuggingFaceEncoder.encode method), the same values are repeated over and over again


xr_obj : xr.Array
coord : str
keep_dim : bool


Expand source code
def collapse_multidim_coord(xr_obj, coord, keep_dim):
    """As a result of iterative construction of `xarray`s in our various functions
        (such as in the `HuggingFaceEncoder.encode` method), the same values are repeated
        over and over again

        xr_obj (xr.Array): _description_
        coord (str): _description_
        keep_dim (bool): _description_

        _type_: _description_
    imputer = SimpleImputer(strategy="most_frequent")
        stimuli = imputer.fit_transform(xr_obj[coord])[0]
        return xr_obj.assign_coords({coord: (keep_dim, stimuli)})
    except ValueError as e:  # TODO which exception? what scenario does this cover?
        stimuli = imputer.fit_transform(xr_obj[coord]).transpose()[0]
        return xr_obj.assign_coords({coord: (keep_dim, stimuli)})
def copy_metadata(target: xarray.core.dataarray.DataArray, source: xarray.core.dataarray.DataArray, dim: str) ‑> xarray.core.dataarray.DataArray

copies the metadata coordinates of a source xarray on dimension dim over to target xarray for this to work, the two xr.DataArray objects must have the same dimensions and dimensionality of data, minimally in the dim dimension.


target : xr.DataArray
target xarray to copy the metadata coordinates onto (a copy is made, this does not happen inplace)
source : xr.DataArray
the source xarray for the metadata coordinates along dim
dim : str
dimension of the metadata coordinates (see xr documentation for help)



Expand source code
def copy_metadata(target: xr.DataArray, source: xr.DataArray, dim: str) -> xr.DataArray:
    """copies the metadata coordinates of a source xarray on dimension `dim` over to target xarray
        for this to work, the two `xr.DataArray` objects must have the same dimensions and
        dimensionality of data, minimally in the `dim` dimension.

        target (xr.DataArray): target xarray to copy the metadata coordinates onto
            (a copy is made, this does not happen inplace)
        source (xr.DataArray): the source xarray for the metadata coordinates along `dim`
        dim (str): dimension of the metadata coordinates (see `xr` documentation for help)

    for coord in source[dim].coords:
        target = target.assign_coords({coord: (dim, source[coord].data)})
    return target
def fix_xr_dtypes(xr_obj)

sometimes xarrays end up having dtype='O' (object) instead of the expected dtypes, which is usually 'str'

Expand source code
def fix_xr_dtypes(xr_obj):
    sometimes xarrays end up having dtype='O' (object) instead of the
    expected dtypes, which is usually 'str'
    for c in xr_obj.coords:
        if xr_obj[c].dtype == "O":
            xr_obj[c] = xr_obj[c].astype(str)
    return xr_obj  # this is likely not necessary --- the xr_obj should be modified in-place