5.6 KiB
Modeling the new search DSL
- Searches
- A search is a collection of groupings
- A grouping is a collection of conditions
- A condition is a filter applied to a field
- A filter is a boolean expression applied to a field with an optional argument
- A field refers to a specific database field somewhere in our system
- Available filters
- Sample searches
- SQL Generation
- Decisions
- Should the input type presented to the end-user be tied to the database field or the conditional operator?
- Should the search service maintain a set of filters, or field types and operators?
- How should the values of each filter be represented in the request schema?
- How should the SQL be generated for each filter?
- How do we want to define the joins for the various tables that may come into play?
- Code
- Output
Defining and translating the Search DSL for the Subscriber Search Service.
Searches
A search is a collection of groupings
@dataclasses.dataclass
class Search:
group: Group
# TODO: sorting : Sorting
A grouping is a collection of conditions
class GroupType(enum.Enum):
AND = 1
# TODO: OR = 2
@dataclasses.dataclass
class Group:
group_type: GroupType
conditions: typing.List[Condition]
A condition is a filter applied to a field
@dataclasses.dataclass
class Condition:
filter: Filter
match : str
A filter is a boolean expression applied to a field with an optional argument
class InputType(enum.Enum):
Nothing = 1
String = 2
Date = 3
Tag = 4
TagSet = 5
Message = 6
@dataclasses.dataclass
class Filter:
operator: str
field: Field
input_type: InputType
A field refers to a specific database field somewhere in our system
class Database(enum.Enum):
AppDB = 1
Analytics = 2
@dataclasses.dataclass
class FieldType:
name: str
@dataclasses.dataclass
class Field:
name: str
column: str
table: str
database: Database
Available filters
Subscriber email is x
email = Field(
name="email",
column="email",
table="subscribers",
database=Database.AppDB,
)
email = Filter(field=fields.email, operator="is", input_type=InputType.String)
Sample searches
Match subscriber email
Search(
group=Group(
group_type=GroupType.AND,
conditions=[Condition(filter=filters.email, match="test@example.org")],
)
)
SQL Generation
def to_sql(search: Search) -> str:
tables: typing.Set[str] = {"subscribers"}
tables = tables | {
condition.filter.field.table for condition in search.group.conditions
}
def condition_to_sql(condition: Condition):
field = ".".join([condition.filter.field.table, condition.filter.field.column])
return f"{field} {condition.filter.operator} {condition.match}"
def group_to_sql(group: Group) -> str:
operator = "AND" if search.group.group_type == GroupType.AND else "OR"
clauses = f" {operator} ".join(
[condition_to_sql(condition) for condition in group.conditions]
)
return f"({clauses})"
where = group_to_sql(search.group)
return f"""SELECT * FROM {', '.join(tables)} WHERE {where}"""
Decisions
DONE Should the input type presented to the end-user be tied to the database field or the conditional operator?
Seems it should be the operator, as an "equals" operator would match a single
value, whereas an "in" operator would match against multiple. That said, it
could be parameterized by the field's type (e.g. a tag has type str
, its
"equals" operator has type str
, its "in" operator has type List[str]
).
The input type will be defined as a property of the filter being applied.
TODO Should the search service maintain a set of filters, or field types and operators?
- A filter is a combination of a field, an operator, and a type
- A field has a type, and operators could be defined that work with a type or set of types
For the former, the service would have total control over the search filters available to the UI, and the UI would be coupled to the filter collection. With the latter, the UI would have total control over which fields it's able to search on and how, provided the fields are available.
TODO How should the values of each filter be represented in the request schema?
Should they be normalized to strings, or should we allow any type and validate it when we attempt to build the search data model? If the latter, could the available filters be baked into the OpenAPI schema?
TODO How should the SQL be generated for each filter?
Should a SQL template or generation function be attached to each filter?
TODO How do we want to define the joins for the various tables that may come into play?
We'll have to know, one way or another, how to narrow the records from the joined table. Will they all be joined by the subscriber id, or will we need to maintain a map?
Code
import dataclasses
import enum
import typing
<<field>>
<<filter>>
<<condition>>
<<group>>
<<search>>
<<builder>>
class fields:
<<fields>>
class filters:
<<filters>>
searches = [
<<searches>>,
]
Output
SELECT * FROM subscribers WHERE (subscribers.email is test@example.org) |