roam/aweber/20210928155832-translating_the_search_dsl.org
2021-10-08 16:06:24 -04:00

6.4 KiB

Modeling the new search DSL

Defining and translating the Search DSL for the Subscriber Search Service.

Searches

A search is a collection of groupings

  @dataclasses.dataclass
  class Search:
      group: Group
      # TODO: sorting : Sorting
  Search:
    type: object
    properties:
      group:
        $ref: "#/components/schemas/Group"

A grouping is a collection of conditions

  class GroupType(enum.Enum):
      AND = 1
      # TODO: OR = 2


  @dataclasses.dataclass
  class Group:
      group_type: GroupType
      conditions: typing.List[Condition]
  Group:
    type: object
    properties:
      group_type:
        enum:
          - "AND"
      conditions:
        type: array
        items:
          $ref: "#/components/schemas/Condition"

A condition is a filter applied to a field

  @dataclasses.dataclass
  class Condition:
      filter: Filter
      match : str
  Condition:
    type: object
    properties:
      filter:
        $ref: "#/components/schemas/Filter"
      match:
        type: string

A filter is a boolean expression applied to a field with an optional argument

  class InputType(enum.Enum):
      Nothing = 1
      String = 2
      Date = 3
      Tag = 4
      TagSet = 5
      Message = 6
  
  
  @dataclasses.dataclass
  class Filter:
      operator: str
      field: Field
      input_type: InputType

A field refers to a specific database field somewhere in our system

  class Database(enum.Enum):
      AppDB = 1
      Analytics = 2
  
  
  @dataclasses.dataclass
  class FieldType:
      name: str
  
  
  @dataclasses.dataclass
  class Field:
      name: str
      column: str
      table: str
      database: Database

Available filters

Subscriber email is x

  email = Field(
      name="email",
      column="email",
      table="subscribers",
      database=Database.AppDB,
  )
  email = Filter(field=fields.email, operator="is", input_type=InputType.String)

Sample searches

Match subscriber email

  Search(
      group=Group(
          group_type=GroupType.AND,
          conditions=[Condition(filter=filters.email, match="test@example.org")],
      )
  )

SQL Generation

  def to_sql(search: Search) -> str:
      tables: typing.Set[str] = {"subscribers"}
      tables = tables | {
          condition.filter.field.table for condition in search.group.conditions
      }

      def condition_to_sql(condition: Condition):
          field = ".".join([condition.filter.field.table, condition.filter.field.column])
          return f"{field} {condition.filter.operator} {condition.match}"

      def group_to_sql(group: Group) -> str:
          operator = "AND" if search.group.group_type == GroupType.AND else "OR"
          clauses = f" {operator} ".join(
              [condition_to_sql(condition) for condition in group.conditions]
          )
          return f"({clauses})"

      where = group_to_sql(search.group)
      return f"""SELECT * FROM {', '.join(tables)} WHERE {where}"""

Decisions

DONE Should the input type presented to the end-user be tied to the database field or the conditional operator?

Seems it should be the operator, as an "equals" operator would match a single value, whereas an "in" operator would match against multiple. That said, it could be parameterized by the field's type (e.g. a tag has type str, its "equals" operator has type str, its "in" operator has type List[str]).


The input type will be defined as a property of the filter being applied.

DONE Should the search service maintain a set of filters, or field types and operators?

  • A filter is a combination of a field, an operator, and a type
  • A field has a type, and operators could be defined that work with a type or set of types

For the former, the service would have total control over the search filters available to the UI, and the UI would be coupled to the filter collection. With the latter, the UI would have total control over which fields it's able to search on and how, provided the fields are available.


The search service will maintain a set of filters.

TODO How should the values of each filter be represented in the request schema?

Should they be normalized to strings, or should we allow any type and validate it when we attempt to build the search data model? If the latter, could the available filters be baked into the OpenAPI schema?

TODO How should the SQL be generated for each filter?

Should a SQL template or generation function be attached to each filter?

TODO How do we want to define the joins for the various tables that may come into play?

We'll have to know, one way or another, how to narrow the records from the joined table. Will they all be joined by the subscriber id, or will we need to maintain a map?

Code

Python

  import dataclasses
  import enum
  import typing
  
  
  <<field>>
  
  
  <<filter>>
  
  
  <<condition>>
  
  
  <<group>>
  
  
  <<search>>
  
  
  <<builder>>
  
  
  class fields:
      <<fields>>
  
  
  class filters:
      <<filters>>
  
  
  searches = [
      <<searches>>,
  ]
Success: no issues found in 1 source file
Mypy analysis

OpenAPI

Output

SELECT * FROM subscribers WHERE (subscribers.email is test@example.org)
Generated queries