Custom Rules¶

Version

Custom rules were introduced in v0.2.0. Cross-column rules were added in v0.2.0.

LavenderTown allows you to create custom data quality rules beyond the built-in detectors. Rules can be created through the UI or programmatically.

Rule Types¶

Range Rules¶

Validate that numeric values fall within a specified range:

from lavendertown.rules.executors import RangeRule

rule = RangeRule(
    name="price_range",
    description="Price must be between 0 and 1000",
    column="price",
    min_value=0.0,
    max_value=1000.0
)

Regex Rules¶

Validate string patterns using regular expressions:

from lavendertown.rules.executors import RegexRule

rule = RegexRule(
    name="email_format",
    description="Email must match standard format",
    column="email",
    pattern=r"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$"
)

Enum Rules¶

Validate that values are from an allowed set:

from lavendertown.rules.executors import EnumRule

rule = EnumRule(
    name="valid_status",
    description="Status must be one of the allowed values",
    column="status",
    allowed_values=["active", "inactive", "pending"]
)

Using Rules¶

Programmatically¶

from lavendertown.rules.models import RuleSet
from lavendertown.rules.executors import RangeRule
from lavendertown.detectors.rule_based import RuleBasedDetector
from lavendertown import Inspector

# Create ruleset
ruleset = RuleSet(name="my_rules", description="Custom validation rules")

# Add rules
ruleset.add_rule(RangeRule(
    name="age_range",
    description="Age must be between 0 and 120",
    column="age",
    min_value=0.0,
    max_value=120.0
))

# Create detector
rule_detector = RuleBasedDetector(ruleset)

# Use with Inspector
inspector = Inspector(df, detectors=[rule_detector])
findings = inspector.detect()

Through the UI¶

Run your Streamlit app with inspector.render()
Click "Manage Rules" in the sidebar
Click "Create New Rule"
Select rule type and configure parameters
Rules execute automatically with each analysis

RuleSet Management¶

Saving and Loading Rules¶

from lavendertown.rules.storage import save_ruleset, load_ruleset

# Save ruleset
save_ruleset(ruleset, "my_rules.json")

# Load ruleset
loaded_ruleset = load_ruleset("my_rules.json")

Exporting Rules¶

Rules can be exported to other formats:

from lavendertown.export.pandera import export_ruleset_to_pandera
from lavendertown.export.great_expectations import export_ruleset_to_great_expectations_json

# Export to Pandera
pandera_schema = export_ruleset_to_pandera(ruleset)

# Export to Great Expectations
ge_suite = export_ruleset_to_great_expectations_json(ruleset, "my_suite")

Rule Execution¶

Rules are executed by the RuleBasedDetector, which:

Validates that required columns exist
Applies each rule to the DataFrame
Returns GhostFinding objects for violations
Handles both Pandas and Polars DataFrames

Best Practices¶

Name rules clearly: Use descriptive names that explain what the rule checks
Provide descriptions: Help users understand the purpose of each rule
Test rules: Verify rules work correctly with sample data
Organize rulesets: Group related rules together in rulesets
Version control: Save rulesets to JSON files and track them in version control

Next Steps¶

Learn about Cross-Column Rules for multi-column validation
See API Reference for detailed rule documentation