Models¶
Core data models for ghost findings.
GhostFinding¶
Represents a single data quality issue detected in a dataset.
GhostFinding
dataclass
¶
Represents a single data quality issue (ghost) detected in a dataset.
GhostFindings are the primary output of data quality detectors. They encapsulate all information about a detected issue in a structured format that can be displayed in the UI, exported to various formats, or processed programmatically.
Attributes:
| Name | Type | Description |
|---|---|---|
ghost_type |
str
|
Category of the ghost/issue. Common types include: - "null": Excessive null values - "type": Type inconsistencies - "outlier": Statistical outliers - "drift": Schema or distribution drift - "rule": Custom rule violations |
column |
str
|
Name of the affected column. Empty string for issues that don't relate to a specific column. |
severity |
str
|
Severity level of the issue. Valid values: - "info": Informational, minor issue - "warning": Warning-level issue that may need attention - "error": Error-level issue requiring immediate attention |
description |
str
|
Human-readable description of the issue, suitable for display in UI or reports. |
row_indices |
list[int] | None
|
Optional list of row indices (0-based) affected by the issue. None if specific row indices are not available or not applicable. For Polars DataFrames, this may often be None as Polars doesn't maintain index concepts. |
metadata |
dict[str, Any]
|
Additional context information as key-value pairs. Contains detector-specific metadata such as statistics, thresholds, or diagnostic information. Empty dict by default. |
Example
Create a finding manually::
finding = GhostFinding(
ghost_type="null",
column="email",
severity="warning",
description="Column 'email' has 15 null values (25% of 60 rows)",
row_indices=[2, 5, 8, 12, 15],
metadata={"null_count": 15, "null_percentage": 0.25}
)
Convert to/from dict for serialization::
# Serialize
data = finding.to_dict()
# Deserialize
finding = GhostFinding.from_dict(data)
Source code in lavendertown/models.py
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 | |
Functions¶
__post_init__ ¶
Validate severity values after initialization.
Ensures that the severity attribute is one of the valid values. Called automatically by the dataclass decorator after initialization.
Raises:
| Type | Description |
|---|---|
ValueError
|
If severity is not one of "info", "warning", or "error". |
Source code in lavendertown/models.py
from_dict
classmethod
¶
Create finding from dictionary.
Deserializes a dictionary back into a GhostFinding instance. Used for loading findings from JSON files or retrieving cached findings from Streamlit's cache.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
dict[str, Any]
|
Dictionary containing finding data. Must include the required fields: "ghost_type", "column", "severity", and "description". Optional fields "row_indices" and "metadata" will use defaults if not present. |
required |
Returns:
| Type | Description |
|---|---|
'GhostFinding'
|
GhostFinding instance initialized with data from the dictionary. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If required fields ("ghost_type", "column", "severity", "description") are missing from the dictionary. |
Example
Deserialize from JSON::
import json
json_str = '{"ghost_type": "null", "column": "email", ...}'
data = json.loads(json_str)
finding = GhostFinding.from_dict(data)
Source code in lavendertown/models.py
to_dict ¶
Convert finding to dictionary for serialization.
Converts the GhostFinding instance to a dictionary representation suitable for JSON serialization or caching. All fields are included in the dictionary.
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
Dictionary containing all finding attributes. Keys match the |
dict[str, Any]
|
dataclass field names: "ghost_type", "column", "severity", |
dict[str, Any]
|
"description", "row_indices", and "metadata". |
Example
Serialize for JSON export::
import json
finding = GhostFinding(...)
data = finding.to_dict()
json_str = json.dumps(data)