Cross-Column Rules¶
Cross-column rules validate relationships between multiple columns, enabling complex business logic validation.
Overview¶
Cross-column rules check relationships across columns, such as: - Equality between columns - Arithmetic relationships (sum, difference) - Conditional logic (if-then rules) - Referential integrity
Rule Operations¶
Equals¶
Check that two columns have equal values:
from lavendertown.rules.cross_column import CrossColumnRule
rule = CrossColumnRule(
name="name_equality",
description="First name and last name must match",
source_columns=["first_name", "last_name"],
operation="equals"
)
Comparison Operations¶
Check that one column is greater than or less than another:
# Greater than
rule = CrossColumnRule(
name="price_check",
description="Total must be greater than subtotal",
source_columns=["total", "subtotal"],
operation="greater_than"
)
# Less than
rule = CrossColumnRule(
name="discount_check",
description="Discount must be less than price",
source_columns=["discount", "price"],
operation="less_than"
)
Arithmetic Operations¶
Validate arithmetic relationships:
# Sum equals
rule = CrossColumnRule(
name="subtotal_check",
description="Subtotal must equal quantity * unit_price",
source_columns=["quantity", "unit_price"],
operation="sum_equals",
target_column="subtotal"
)
Conditional Logic¶
Implement if-then validation:
rule = CrossColumnRule(
name="payment_date_check",
description="If status is 'completed', payment_date must be set",
source_columns=["status", "payment_date"],
operation="conditional",
condition={
"if_column": "status",
"if_value": "completed",
"then_column": "payment_date",
"then_value": "not_null" # Special value for null check
}
)
Referential Integrity¶
Validate that values in one column exist in another:
rule = CrossColumnRule(
name="category_referential",
description="Category values must exist in valid_categories",
source_columns=["category"],
operation="referential",
target_column="valid_categories"
)
Using Cross-Column Rules¶
Programmatically¶
from lavendertown.rules.models import RuleSet
from lavendertown.rules.cross_column import CrossColumnRule
from lavendertown.detectors.rule_based import RuleBasedDetector
from lavendertown import Inspector
# Create ruleset
ruleset = RuleSet(name="cross_column_rules", description="Multi-column validation")
# Add cross-column rule
ruleset.add_rule(CrossColumnRule(
name="total_validation",
description="Total must equal sum of line items",
source_columns=["item1", "item2", "item3"],
operation="sum_equals",
target_column="total"
))
# Use with Inspector
rule_detector = RuleBasedDetector(ruleset)
inspector = Inspector(df, detectors=[rule_detector])
findings = inspector.detect()
Through the UI¶
- Run your Streamlit app with
inspector.render() - Click "Manage Rules" in the sidebar
- Click "Create New Rule"
- Select "Cross-Column" rule type
- Configure source columns, operation, and target column
- Rules execute automatically with each analysis
Supported Operations¶
| Operation | Description | Source Columns | Target Column |
|---|---|---|---|
equals |
Column A equals Column B | 2 | No |
greater_than |
Column A > Column B | 2 | No |
less_than |
Column A < Column B | 2 | No |
sum_equals |
Sum of source columns equals target | 2+ | Yes |
conditional |
If condition met, then check another | 2+ | No |
referential |
Values in source exist in target | 1 | Yes |
Examples¶
Order Validation¶
ruleset = RuleSet(name="order_validation")
# Subtotal check
ruleset.add_rule(CrossColumnRule(
name="subtotal",
description="Subtotal must equal quantity * unit_price",
source_columns=["quantity", "unit_price"],
operation="sum_equals",
target_column="subtotal"
))
# Total check
ruleset.add_rule(CrossColumnRule(
name="total",
description="Total must be less than or equal to subtotal",
source_columns=["total", "subtotal"],
operation="less_than"
))
Conditional Validation¶
# If order is completed, payment date must be set
ruleset.add_rule(CrossColumnRule(
name="payment_required",
description="Completed orders must have payment date",
source_columns=["status", "payment_date"],
operation="conditional",
condition={
"if_column": "status",
"if_value": "completed",
"then_column": "payment_date",
"then_value": "not_null"
}
))
Best Practices¶
- Use descriptive names: Clearly indicate what relationship is being validated
- Validate column existence: Ensure all referenced columns exist in your data
- Handle nulls: Consider how null values should be handled in your rules
- Test thoroughly: Verify rules work correctly with various data scenarios
- Document business logic: Use descriptions to explain why rules exist
Next Steps¶
- Learn about Custom Rules for single-column validation
- See API Reference for detailed documentation