Custom Rules¶
Version
Custom rules were introduced in v0.2.0. Cross-column rules were added in v0.2.0.
LavenderTown allows you to create custom data quality rules beyond the built-in detectors. Rules can be created through the UI or programmatically.
Rule Types¶
Range Rules¶
Validate that numeric values fall within a specified range:
from lavendertown.rules.executors import RangeRule
rule = RangeRule(
name="price_range",
description="Price must be between 0 and 1000",
column="price",
min_value=0.0,
max_value=1000.0
)
Regex Rules¶
Validate string patterns using regular expressions:
from lavendertown.rules.executors import RegexRule
rule = RegexRule(
name="email_format",
description="Email must match standard format",
column="email",
pattern=r"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$"
)
Enum Rules¶
Validate that values are from an allowed set:
from lavendertown.rules.executors import EnumRule
rule = EnumRule(
name="valid_status",
description="Status must be one of the allowed values",
column="status",
allowed_values=["active", "inactive", "pending"]
)
Using Rules¶
Programmatically¶
from lavendertown.rules.models import RuleSet
from lavendertown.rules.executors import RangeRule
from lavendertown.detectors.rule_based import RuleBasedDetector
from lavendertown import Inspector
# Create ruleset
ruleset = RuleSet(name="my_rules", description="Custom validation rules")
# Add rules
ruleset.add_rule(RangeRule(
name="age_range",
description="Age must be between 0 and 120",
column="age",
min_value=0.0,
max_value=120.0
))
# Create detector
rule_detector = RuleBasedDetector(ruleset)
# Use with Inspector
inspector = Inspector(df, detectors=[rule_detector])
findings = inspector.detect()
Through the UI¶
- Run your Streamlit app with
inspector.render() - Click "Manage Rules" in the sidebar
- Click "Create New Rule"
- Select rule type and configure parameters
- Rules execute automatically with each analysis
RuleSet Management¶
Saving and Loading Rules¶
from lavendertown.rules.storage import save_ruleset, load_ruleset
# Save ruleset
save_ruleset(ruleset, "my_rules.json")
# Load ruleset
loaded_ruleset = load_ruleset("my_rules.json")
Exporting Rules¶
Rules can be exported to other formats:
from lavendertown.export.pandera import export_ruleset_to_pandera
from lavendertown.export.great_expectations import export_ruleset_to_great_expectations_json
# Export to Pandera
pandera_schema = export_ruleset_to_pandera(ruleset)
# Export to Great Expectations
ge_suite = export_ruleset_to_great_expectations_json(ruleset, "my_suite")
Rule Execution¶
Rules are executed by the RuleBasedDetector, which:
- Validates that required columns exist
- Applies each rule to the DataFrame
- Returns
GhostFindingobjects for violations - Handles both Pandas and Polars DataFrames
Best Practices¶
- Name rules clearly: Use descriptive names that explain what the rule checks
- Provide descriptions: Help users understand the purpose of each rule
- Test rules: Verify rules work correctly with sample data
- Organize rulesets: Group related rules together in rulesets
- Version control: Save rulesets to JSON files and track them in version control
Next Steps¶
- Learn about Cross-Column Rules for multi-column validation
- See API Reference for detailed rule documentation