Base Rule¶
Abstract base class for custom data quality rules.
CustomRule ¶
Bases: ABC
Abstract base class for custom data quality rules.
Custom rules allow users to define domain-specific data quality checks beyond the built-in detectors. They implement a check() method that analyzes a DataFrame and returns GhostFinding objects for any violations.
Rules can be single-column (applying to a specific column) or cross-column (applying logic across multiple columns or the entire DataFrame). They should handle both Pandas and Polars DataFrames by detecting the backend and using the appropriate API.
Subclasses must implement the check() method. Common rule types are provided in lavendertown.rules.executors (RangeRule, RegexRule, EnumRule).
Attributes:
| Name | Type | Description |
|---|---|---|
name |
Human-readable name of the rule. |
|
description |
Description of what the rule checks. |
|
column |
Column name to apply the rule to, or None for cross-column rules. |
Example
Implement a custom rule::
from lavendertown.rules.base import CustomRule
from lavendertown.models import GhostFinding
from lavendertown.detectors.base import detect_dataframe_backend
class PositiveValueRule(CustomRule):
def __init__(self, column: str):
super().__init__("positive_values", "Check for positive values", column)
def check(self, df):
backend = detect_dataframe_backend(df)
findings = []
# Check for negative values
# ... detection logic ...
return findings
Source code in lavendertown/rules/base.py
Functions¶
__init__ ¶
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Human-readable rule name. Should be unique within a rule set. |
required |
description
|
str
|
Rule description explaining what the rule checks. This description may be included in finding descriptions. |
required |
column
|
str | None
|
Column name to apply the rule to. Use None for cross-column rules that don't apply to a specific column. |
None
|
Source code in lavendertown/rules/base.py
check
abstractmethod
¶
Check the rule against a DataFrame.
This is the main method that subclasses must implement. It should analyze the DataFrame according to the rule's logic and return findings for any violations.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
object
|
DataFrame to check. Can be a pandas.DataFrame or
polars.DataFrame. The rule should use
|
required |
Returns:
| Type | Description |
|---|---|
list[GhostFinding]
|
List of GhostFinding objects representing rule violations. |
list[GhostFinding]
|
All findings should have ghost_type="rule". Returns an empty |
list[GhostFinding]
|
list if no violations are found. |
Note
Rules should handle both Pandas and Polars DataFrames. Use
detect_dataframe_backend() to determine the backend and
use the appropriate API.
Source code in lavendertown/rules/base.py
get_name ¶
Get the name of this rule.
Returns:
| Type | Description |
|---|---|
str
|
The rule's name as specified during initialization. |