Skip to content

Enhancing _detect_singletons #229

@styfenschaer

Description

@styfenschaer

Hi @s3alfisc

I wanted to have a look at the singleton detection to speed it up possibly. You implemented that in commit 4eaef7d. May I ask why you always break in the inner loop (see code below)? Having for example such an array, you do not detect the 3rd row as a singleton :

ids = np.array(
    [
        [0, 2, 1],
        [0, 2, 1],
        [0, 1, 3],
        [0, 1, 2],
        [0, 1, 2],
    ]
)

Probably I misunderstand the idea of singletons.

Source
def _detect_singletons(ids):
    """
    Detect singleton fixed effects
    Args:have a
        ids (np.ndarray): A numpy array of fixed effects.
    Returns:
        An array of booleans indicating which observations have a singleton fixed effect.
    """

    N, k = ids.shape

    singleton_idx = np.full(N, False, dtype=bool)
    singleton_idx_tmp = np.full(N, False, dtype=bool)
    singleton_idx_tmp_old = singleton_idx.copy()

    ids_tmp = ids.copy()

    while True:
        for x in range(k):
            col = ids[:, x]
            col_tmp = ids_tmp[:, x]

            # note that this only "works" as fixed effects are integers from 0, 1, ..., n_fixef -1
            # and np.bincount orders results in ascending (integer) order
            counts = np.bincount(col_tmp)

            if np.any(counts == 1):
                idx = np.where(counts == 1)

                singleton_idx[np.isin(col, idx)] = True
                singleton_idx_tmp[np.isin(col_tmp, idx)] = True

                ids_tmp = ids_tmp[~singleton_idx_tmp, :].astype(ids_tmp.dtype)
                singleton_idx_tmp = singleton_idx_tmp[~singleton_idx_tmp]

            break

        if np.array_equal(singleton_idx_tmp, singleton_idx_tmp_old):
            break

        singleton_idx_tmp_old = singleton_idx_tmp.copy()

    return singleton_idx

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingenhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions