Skip to content

Base Rule

Abstract base class for custom data quality rules.

CustomRule

Bases: ABC

Abstract base class for custom data quality rules.

Custom rules allow users to define domain-specific data quality checks beyond the built-in detectors. They implement a check() method that analyzes a DataFrame and returns GhostFinding objects for any violations.

Rules can be single-column (applying to a specific column) or cross-column (applying logic across multiple columns or the entire DataFrame). They should handle both Pandas and Polars DataFrames by detecting the backend and using the appropriate API.

Subclasses must implement the check() method. Common rule types are provided in lavendertown.rules.executors (RangeRule, RegexRule, EnumRule).

Attributes:

Name Type Description
name

Human-readable name of the rule.

description

Description of what the rule checks.

column

Column name to apply the rule to, or None for cross-column rules.

Example

Implement a custom rule::

from lavendertown.rules.base import CustomRule
from lavendertown.models import GhostFinding
from lavendertown.detectors.base import detect_dataframe_backend

class PositiveValueRule(CustomRule):
    def __init__(self, column: str):
        super().__init__("positive_values", "Check for positive values", column)

    def check(self, df):
        backend = detect_dataframe_backend(df)
        findings = []
        # Check for negative values
        # ... detection logic ...
        return findings
Source code in lavendertown/rules/base.py
class CustomRule(ABC):
    """Abstract base class for custom data quality rules.

    Custom rules allow users to define domain-specific data quality checks
    beyond the built-in detectors. They implement a check() method that
    analyzes a DataFrame and returns GhostFinding objects for any violations.

    Rules can be single-column (applying to a specific column) or cross-column
    (applying logic across multiple columns or the entire DataFrame). They
    should handle both Pandas and Polars DataFrames by detecting the backend
    and using the appropriate API.

    Subclasses must implement the check() method. Common rule types are
    provided in lavendertown.rules.executors (RangeRule, RegexRule, EnumRule).

    Attributes:
        name: Human-readable name of the rule.
        description: Description of what the rule checks.
        column: Column name to apply the rule to, or None for cross-column rules.

    Example:
        Implement a custom rule::

            from lavendertown.rules.base import CustomRule
            from lavendertown.models import GhostFinding
            from lavendertown.detectors.base import detect_dataframe_backend

            class PositiveValueRule(CustomRule):
                def __init__(self, column: str):
                    super().__init__("positive_values", "Check for positive values", column)

                def check(self, df):
                    backend = detect_dataframe_backend(df)
                    findings = []
                    # Check for negative values
                    # ... detection logic ...
                    return findings
    """

    def __init__(self, name: str, description: str, column: str | None = None):
        """Initialize the custom rule.

        Args:
            name: Human-readable rule name. Should be unique within a rule set.
            description: Rule description explaining what the rule checks.
                This description may be included in finding descriptions.
            column: Column name to apply the rule to. Use None for cross-column
                rules that don't apply to a specific column.
        """
        self.name = name
        self.description = description
        self.column = column

    @abstractmethod
    def check(self, df: object) -> list[GhostFinding]:
        """Check the rule against a DataFrame.

        This is the main method that subclasses must implement. It should
        analyze the DataFrame according to the rule's logic and return
        findings for any violations.

        Args:
            df: DataFrame to check. Can be a pandas.DataFrame or
                polars.DataFrame. The rule should use
                ``detect_dataframe_backend()`` to determine which API to use.

        Returns:
            List of GhostFinding objects representing rule violations.
            All findings should have ghost_type="rule". Returns an empty
            list if no violations are found.

        Note:
            Rules should handle both Pandas and Polars DataFrames. Use
            ``detect_dataframe_backend()`` to determine the backend and
            use the appropriate API.
        """
        pass

    def get_name(self) -> str:
        """Get the name of this rule.

        Returns:
            The rule's name as specified during initialization.
        """
        return self.name

Functions

__init__

__init__(name, description, column=None)

Parameters:

Name Type Description Default
name str

Human-readable rule name. Should be unique within a rule set.

required
description str

Rule description explaining what the rule checks. This description may be included in finding descriptions.

required
column str | None

Column name to apply the rule to. Use None for cross-column rules that don't apply to a specific column.

None
Source code in lavendertown/rules/base.py
def __init__(self, name: str, description: str, column: str | None = None):
    """Initialize the custom rule.

    Args:
        name: Human-readable rule name. Should be unique within a rule set.
        description: Rule description explaining what the rule checks.
            This description may be included in finding descriptions.
        column: Column name to apply the rule to. Use None for cross-column
            rules that don't apply to a specific column.
    """
    self.name = name
    self.description = description
    self.column = column

check abstractmethod

check(df)

Check the rule against a DataFrame.

This is the main method that subclasses must implement. It should analyze the DataFrame according to the rule's logic and return findings for any violations.

Parameters:

Name Type Description Default
df object

DataFrame to check. Can be a pandas.DataFrame or polars.DataFrame. The rule should use detect_dataframe_backend() to determine which API to use.

required

Returns:

Type Description
list[GhostFinding]

List of GhostFinding objects representing rule violations.

list[GhostFinding]

All findings should have ghost_type="rule". Returns an empty

list[GhostFinding]

list if no violations are found.

Note

Rules should handle both Pandas and Polars DataFrames. Use detect_dataframe_backend() to determine the backend and use the appropriate API.

Source code in lavendertown/rules/base.py
@abstractmethod
def check(self, df: object) -> list[GhostFinding]:
    """Check the rule against a DataFrame.

    This is the main method that subclasses must implement. It should
    analyze the DataFrame according to the rule's logic and return
    findings for any violations.

    Args:
        df: DataFrame to check. Can be a pandas.DataFrame or
            polars.DataFrame. The rule should use
            ``detect_dataframe_backend()`` to determine which API to use.

    Returns:
        List of GhostFinding objects representing rule violations.
        All findings should have ghost_type="rule". Returns an empty
        list if no violations are found.

    Note:
        Rules should handle both Pandas and Polars DataFrames. Use
        ``detect_dataframe_backend()`` to determine the backend and
        use the appropriate API.
    """
    pass

get_name

get_name()

Get the name of this rule.

Returns:

Type Description
str

The rule's name as specified during initialization.

Source code in lavendertown/rules/base.py
def get_name(self) -> str:
    """Get the name of this rule.

    Returns:
        The rule's name as specified during initialization.
    """
    return self.name