roam/daily/2021-11-04.org

:PROPERTIES:
:ID:       a20323e3-fc41-496c-8acb-cf62cdb3ba27
:END:
#+title: 2021-11-04
* Backend BOF
- Scott M
- Kevin V
- Josh E
  - Pycares was failing to install due to a dependency (safety virtual
    environment did not have wheel)
- Josh B
  - Redis issue in sessions resuilting in rescheduling kubenetes node workers
  - May need to change how we're running certain Redises in k8s; the old
    workload took time to shut down, the new instance read old data, failures
    ensued.
  - We need some measure of fault tolerance / HA
  - Reasonable way to run a 3-pod redis that would be durable to that kind of
    failure?
  - Be really clear about the use cases and failure cases of Redis or any
    persistent store in k8s.
  - Redis usually used as a cache or database. Applications should be able to
    work without a cache. Databases should have durability.
  - (Dave S) We're treating redis like a cache, but failing to consider
    connection timeouts and failures (blocking connect). Lack of data didn't
    affect the service, but lack of connectivity did.
  - (Amber H) We have existing VMs, is that an option?
    - (Josh B) It is. There are challenges with client behaviors using HA Redis
      and handling failover. VMs aren't my first choice.
    - (Gavin R) We should focus on figuring out a workable k8s solution first.
      The underlying storage solution is in essence the same thing.
    - (Dave S) Aioredis doesn't have direct support for sharding or replication
      (needs an exception handling wrapper). Simliar with TRedis.
    - (Gavin R) A fork of redis implements transparent clustering, which directs
      the client to the correct instance. (https://keydb.dev/)
    - (Josh B) Also has multi-master
    - (Dave S) The trouble is writes result in a denial response, leaving it to
      the client to find the correct instance it can write to
    - (Josh B) HAProxy could be told to talk the Redis protocol to find the
      primary and send traffic there for clients that can't handle the read-only
      response well.
  - (Dave S) In k8s even when running as a cache, Redis will occasionally (4-5
    times past year) fail to write to disk when there's some confusion between
    CEPH and the underlying mount
    - (Josh B) This was a lower-level CEPH issue; a race condition based on our
      specific configuration.
  - I will take time to put a Redis pattern together
- Ihar H
  - Recently got feedback updating one of the service pipelines to meet
    standards, want to clarify expectations. We'd decided to split the stages
    - (Gavin R) I believe you can accomplish what you want to accomplish without
      creating so many explicit stages
    - (Ihar H) You have to click on a pipeline stage to see whether one of the nested pipeline tasks failed, and further stages could be run without the prior one passing
      - (Amber H / Gavin R) We do need to sometimes get deploys out even when
        acceptance tests fail
    - (Ihar H) Even if code is deployed to production, the pipeline will be marked
      as failed in some cases
    - (Gavin R) Your proposal is fine, we need to clarify best practices
      documentation to cover all cases.
    - (Dave S) Gitlab visualization has changed over time, we can see if
      anything can be improved to make things visually clearer
    - (Gavin R) Since we are using immutable image building, there's little
      value in running unit and integration tests in staging
      - (Dave S) ... they /shouldn't/
    - (Gavin R) The prior method was tying stages to environments, which we
      don't want to do, they are different things.
    - (Ihar H) I will update the confluence page and follow up with Dave S
    - (Amber H) I want to see more of the /why/ in that document
- Gavin R
  - Update on experiences with psycopg3, fixed the bug I found, I've been using
    it pretty heavily. Seems to work well. API is backwards compatible and has
    also evolved. Idiomatic row factory usage is different. One of the cool
    things is if you use dataclasses, you can use a dataclass row factory.
  - (Amber H) We just need them and Pydantic to have a baby
  - (Gavin R) That probably wouldn't be too hard to do
  - (Andrew R) I really don't like what has to be done to pass a value to the
    =IN= keyword. In Psycopg, you have to format the comma-separated value yourself.
    - (Gavin R) You can specify an array
    - (Andrew R) That works with ~= ANY (...)~, but is much less performant than
      using =IN=.
    - (Alex C) A newer version of Postgres may fix the performance issue
- Eric T
- David R
  - Been doing a lot of front-end experimentation with the concept of module
    federation to power out micro-frontend architecture for our react clients to
    build an app shell to compose them into a single MFA. We should discuss soon
    what it'd look like to deploy something like that. Grossly simplifying,
    every MFE exposes as part of its build a =remoteEntry.js= file that the
    shell application needs to be told as part of its configuration where that
    lives so it can load the module when necessary. The route willi then use the
    remote to on-demand load the module and its dependencies to load that
    application. I believe we just need that remote to be built and available as
    part of our deployment strategy. Will want to get the POC I've been working
    on into staging to start getting it working outside of local development.
    - (Gavin R) This is the effort to replace Sites as our application shell? (Yes)
      - https://confluence.aweber.io/display/BETL/Decommissioning+Sites
      - (Gavin R) The larger implication to backend is sessions and sites going
        away and the front-end application using the Public API going forward.
      - (Scott M) Something will need to fetch AWVars data to provide to the
        applications, from the session service for now, as a dynamic service
        hosting the shell application?
      - (David R) It would just be static content / JavaScript, provided it is
        able to discover the contained applications.
      - (Gavin R) JS should be on aweber-static, the shell application should be
        hosted on something TBD, and we'll need to bridge the transition between
        using sessions for stored data to something else.
      - (David R) I see the session service as a middle state (stepping stone)
        to work towards no longer needing it.
      - (Scott M) We should look into how it can get direct access to the session service
        - (Gavin R) I don't think we want that to be a dependency. We need a
          global state managed in a better way
        - (Dave S) We should start putting thought into how we're using local
          storage, etc. in a consistent way so data can be cached and not
          re-fetched, avoiding having each service have its own data model.
        - (Josh B) The shell app could provide a common interface to access
          shared cache data.
    - (David R) Let's make sure there isn't an assumption that the scout file and the
      remote entry file are the same thing.
- Dave S
  - Had something, will post it; Updating docs on aweber API and what's behind it.
    - Updated endpoint map on
      https://confluence.aweber.io/display/STD/api.aweber.com+Endpoint+Management
  - If anybody really understands CORS, we need someone in the company that
    does. We're having CORS failures with companies like FaceBook, etc. If
    anyone has experience, please speak up in BoF or elsewhere.
- Cedric W
  - Alex and I drafted an ACP for bulk actions, will post a link for review.
    Will probably be more after tomorrow's meeting.
    - https://confluence.aweber.io/display/AR/Bulk-Action+Consumers+ACP
- Arnela M
- Andrew R
  - Pydantic's awesome!
- Amber H
  - Working with the analytics ingestion service, it's pretty awesome for opens
    and clicks reporting. Currently in the development staging
  - Talking with Scott the other day about how to do some cross-team code
    reviews and wanted to float the idea here on having other people on other
    teams to commit to some time during a sprint to review code from other
    teams.
- Alex C
- Correl R
  - Perl stinks. Derefencing data structures from scalars with weird symbols
    stinks.
updates 2021-11-04 20:52:09 +00:00			`:PROPERTIES:`
			`:ID: a20323e3-fc41-496c-8acb-cf62cdb3ba27`
			`:END:`
			`#+title: 2021-11-04`
			`* Backend BOF`
			`- Scott M`
			`- Kevin V`
			`- Josh E`
			`- Pycares was failing to install due to a dependency (safety virtual`
			`environment did not have wheel)`
			`- Josh B`
			`- Redis issue in sessions resuilting in rescheduling kubenetes node workers`
			`- May need to change how we're running certain Redises in k8s; the old`
			`workload took time to shut down, the new instance read old data, failures`
			`ensued.`
			`- We need some measure of fault tolerance / HA`
			`- Reasonable way to run a 3-pod redis that would be durable to that kind of`
			`failure?`
			`- Be really clear about the use cases and failure cases of Redis or any`
			`persistent store in k8s.`
			`- Redis usually used as a cache or database. Applications should be able to`
			`work without a cache. Databases should have durability.`
			`- (Dave S) We're treating redis like a cache, but failing to consider`
			`connection timeouts and failures (blocking connect). Lack of data didn't`
			`affect the service, but lack of connectivity did.`
			`- (Amber H) We have existing VMs, is that an option?`
			`- (Josh B) It is. There are challenges with client behaviors using HA Redis`
			`and handling failover. VMs aren't my first choice.`
			`- (Gavin R) We should focus on figuring out a workable k8s solution first.`
			`The underlying storage solution is in essence the same thing.`
			`- (Dave S) Aioredis doesn't have direct support for sharding or replication`
			`(needs an exception handling wrapper). Simliar with TRedis.`
			`- (Gavin R) A fork of redis implements transparent clustering, which directs`
			`the client to the correct instance. (https://keydb.dev/)`
			`- (Josh B) Also has multi-master`
			`- (Dave S) The trouble is writes result in a denial response, leaving it to`
			`the client to find the correct instance it can write to`
			`- (Josh B) HAProxy could be told to talk the Redis protocol to find the`
			`primary and send traffic there for clients that can't handle the read-only`
			`response well.`
			`- (Dave S) In k8s even when running as a cache, Redis will occasionally (4-5`
			`times past year) fail to write to disk when there's some confusion between`
			`CEPH and the underlying mount`
			`- (Josh B) This was a lower-level CEPH issue; a race condition based on our`
			`specific configuration.`
			`- I will take time to put a Redis pattern together`
			`- Ihar H`
			`- Recently got feedback updating one of the service pipelines to meet`
			`standards, want to clarify expectations. We'd decided to split the stages`
			`- (Gavin R) I believe you can accomplish what you want to accomplish without`
			`creating so many explicit stages`
			`- (Ihar H) You have to click on a pipeline stage to see whether one of the nested pipeline tasks failed, and further stages could be run without the prior one passing`
			`- (Amber H / Gavin R) We do need to sometimes get deploys out even when`
			`acceptance tests fail`
			`- (Ihar H) Even if code is deployed to production, the pipeline will be marked`
			`as failed in some cases`
			`- (Gavin R) Your proposal is fine, we need to clarify best practices`
			`documentation to cover all cases.`
			`- (Dave S) Gitlab visualization has changed over time, we can see if`
			`anything can be improved to make things visually clearer`
			`- (Gavin R) Since we are using immutable image building, there's little`
			`value in running unit and integration tests in staging`
			`- (Dave S) ... they /shouldn't/`
			`- (Gavin R) The prior method was tying stages to environments, which we`
			`don't want to do, they are different things.`
			`- (Ihar H) I will update the confluence page and follow up with Dave S`
			`- (Amber H) I want to see more of the /why/ in that document`
			`- Gavin R`
			`- Update on experiences with psycopg3, fixed the bug I found, I've been using`
			`it pretty heavily. Seems to work well. API is backwards compatible and has`
			`also evolved. Idiomatic row factory usage is different. One of the cool`
			`things is if you use dataclasses, you can use a dataclass row factory.`
			`- (Amber H) We just need them and Pydantic to have a baby`
			`- (Gavin R) That probably wouldn't be too hard to do`
			`- (Andrew R) I really don't like what has to be done to pass a value to the`
			`=IN= keyword. In Psycopg, you have to format the comma-separated value yourself.`
			`- (Gavin R) You can specify an array`
			`- (Andrew R) That works with ~= ANY (...)~, but is much less performant than`
			`using =IN=.`
			`- (Alex C) A newer version of Postgres may fix the performance issue`
			`- Eric T`
			`- David R`
			`- Been doing a lot of front-end experimentation with the concept of module`
			`federation to power out micro-frontend architecture for our react clients to`
			`build an app shell to compose them into a single MFA. We should discuss soon`
			`what it'd look like to deploy something like that. Grossly simplifying,`
			`every MFE exposes as part of its build a =remoteEntry.js= file that the`
			`shell application needs to be told as part of its configuration where that`
			`lives so it can load the module when necessary. The route willi then use the`
			`remote to on-demand load the module and its dependencies to load that`
			`application. I believe we just need that remote to be built and available as`
			`part of our deployment strategy. Will want to get the POC I've been working`
			`on into staging to start getting it working outside of local development.`
			`- (Gavin R) This is the effort to replace Sites as our application shell? (Yes)`
			`- https://confluence.aweber.io/display/BETL/Decommissioning+Sites`
			`- (Gavin R) The larger implication to backend is sessions and sites going`
			`away and the front-end application using the Public API going forward.`
			`- (Scott M) Something will need to fetch AWVars data to provide to the`
			`applications, from the session service for now, as a dynamic service`
			`hosting the shell application?`
			`- (David R) It would just be static content / JavaScript, provided it is`
			`able to discover the contained applications.`
			`- (Gavin R) JS should be on aweber-static, the shell application should be`
			`hosted on something TBD, and we'll need to bridge the transition between`
			`using sessions for stored data to something else.`
			`- (David R) I see the session service as a middle state (stepping stone)`
			`to work towards no longer needing it.`
			`- (Scott M) We should look into how it can get direct access to the session service`
			`- (Gavin R) I don't think we want that to be a dependency. We need a`
			`global state managed in a better way`
			`- (Dave S) We should start putting thought into how we're using local`
			`storage, etc. in a consistent way so data can be cached and not`
			`re-fetched, avoiding having each service have its own data model.`
			`- (Josh B) The shell app could provide a common interface to access`
			`shared cache data.`
			`- (David R) Let's make sure there isn't an assumption that the scout file and the`
			`remote entry file are the same thing.`
			`- Dave S`
			`- Had something, will post it; Updating docs on aweber API and what's behind it.`
			`- Updated endpoint map on`
			`https://confluence.aweber.io/display/STD/api.aweber.com+Endpoint+Management`
			`- If anybody really understands CORS, we need someone in the company that`
			`does. We're having CORS failures with companies like FaceBook, etc. If`
			`anyone has experience, please speak up in BoF or elsewhere.`
			`- Cedric W`
			`- Alex and I drafted an ACP for bulk actions, will post a link for review.`
			`Will probably be more after tomorrow's meeting.`
			`- https://confluence.aweber.io/display/AR/Bulk-Action+Consumers+ACP`
			`- Arnela M`
			`- Andrew R`
			`- Pydantic's awesome!`
			`- Amber H`
			`- Working with the analytics ingestion service, it's pretty awesome for opens`
			`and clicks reporting. Currently in the development staging`
			`- Talking with Scott the other day about how to do some cross-team code`
			`reviews and wanted to float the idea here on having other people on other`
			`teams to commit to some time during a sprint to review code from other`
			`teams.`
			`- Alex C`
			`- Correl R`
			`- Perl stinks. Derefencing data structures from scalars with weird symbols`
			`stinks.`