:PROPERTIES: :ID: 7b0f97f3-9037-4d05-9170-a478e97c8d1f :END: #+title: Modeling the new search DSL Defining and translating the Search DSL for the [[id:11edd6c9-b976-403b-a419-b5542ddedaae][Subscriber Search Service]]. * Searches ** A search is a collection of groupings #+begin_src python :noweb-ref search @dataclasses.dataclass class Search: group: Group # TODO: sorting : Sorting #+end_src ** A grouping is a collection of conditions #+begin_src python :noweb-ref group class GroupType(enum.Enum): AND = 1 # TODO: OR = 2 @dataclasses.dataclass class Group: group_type: GroupType conditions: typing.List[Condition] #+end_src ** A condition is a filter applied to a field #+begin_src python :noweb-ref condition @dataclasses.dataclass class Condition: filter: Filter match : str #+end_src ** A filter is a boolean expression applied to a field with an optional argument #+begin_src python :noweb-ref filter class InputType(enum.Enum): Nothing = 1 String = 2 Date = 3 Tag = 4 TagSet = 5 Message = 6 @dataclasses.dataclass class Filter: operator: str field: Field input_type: InputType #+end_src ** A field refers to a specific database field somewhere in our system #+begin_src python :noweb-ref field class Database(enum.Enum): AppDB = 1 Analytics = 2 @dataclasses.dataclass class FieldType: name: str @dataclasses.dataclass class Field: name: str column: str table: str database: Database #+end_src ** Available filters *** Subscriber email is x #+begin_src python :noweb-ref fields email = Field( name="email", column="email", table="subscribers", database=Database.AppDB, ) #+end_src #+begin_src python :noweb-ref filters email = Filter(field=fields.email, operator="is", input_type=InputType.String) #+end_src ** Sample searches *** Match subscriber email #+begin_src python :noweb-ref searches Search( group=Group( group_type=GroupType.AND, conditions=[Condition(filter=filters.email, match="test@example.org")], ) ) #+end_src * SQL Generation #+begin_src python :noweb-ref builder def to_sql(search: Search) -> str: tables: typing.Set[str] = {"subscribers"} tables = tables | { condition.filter.field.table for condition in search.group.conditions } def condition_to_sql(condition: Condition): field = ".".join([condition.filter.field.table, condition.filter.field.column]) return f"{field} {condition.filter.operator} {condition.match}" def group_to_sql(group: Group) -> str: operator = "AND" if search.group.group_type == GroupType.AND else "OR" clauses = f" {operator} ".join( [condition_to_sql(condition) for condition in group.conditions] ) return f"({clauses})" where = group_to_sql(search.group) return f"""SELECT * FROM {', '.join(tables)} WHERE {where}""" #+end_src * Decisions ** DONE Should the input type presented to the end-user be tied to the database field or the conditional operator? Seems it should be the operator, as an "equals" operator would match a single value, whereas an "in" operator would match against multiple. That said, it could be /parameterized/ by the field's type (e.g. a tag has type =str=, its "equals" operator has type =str=, its "in" operator has type =List[str]=). -------------------------------------------------------------------------------- The input type will be defined as a property of the filter being applied. ** TODO Should the search service maintain a set of filters, or field types and operators? - A filter is a combination of a field, an operator, and a type - A field has a type, and operators could be defined that work with a type or set of types For the former, the service would have total control over the search filters available to the UI, and the UI would be coupled to the filter collection. With the latter, the UI would have total control over which fields it's able to search on and how, provided the fields are available. ** TODO How should the values of each filter be represented in the request schema? Should they be normalized to strings, or should we allow any type and validate it when we attempt to build the search data model? If the latter, could the available filters be baked into the OpenAPI schema? ** TODO How should the SQL be generated for each filter? Should a SQL template or generation function be attached to each filter? ** TODO How do we want to define the joins for the various tables that may come into play? We'll have to know, one way or another, how to narrow the records from the joined table. Will they all be joined by the subscriber id, or will we need to maintain a map? * Code #+begin_src python :noweb yes :noweb-ref final :exports code :results silent import dataclasses import enum import typing <> <> <> <> <> <> class fields: <> class filters: <> searches = [ <>, ] #+end_src #+RESULTS: #+caption: Mypy analysis #+begin_src bash :noweb yes :results output :exports results mypy <(cat <<'EOF' <> EOF) 2>&1 || true #+end_src #+RESULTS: : Success: no issues found in 1 source file * Output #+caption: Generated queries #+begin_src python :noweb yes :exports results <> return [[to_sql(search)] for search in searches] #+end_src #+RESULTS: | SELECT * FROM subscribers WHERE (subscribers.email is test@example.org) |