2021-11-04

Backend BOF

Backend BOF

Scott M
Kevin V
Josh E
- Pycares was failing to install due to a dependency (safety virtual environment did not have wheel)
Josh B
- Redis issue in sessions resuilting in rescheduling kubenetes node workers
- May need to change how we're running certain Redises in k8s; the old workload took time to shut down, the new instance read old data, failures ensued.
- We need some measure of fault tolerance / HA
- Reasonable way to run a 3-pod redis that would be durable to that kind of failure?
- Be really clear about the use cases and failure cases of Redis or any persistent store in k8s.
- Redis usually used as a cache or database. Applications should be able to work without a cache. Databases should have durability.
- (Dave S) We're treating redis like a cache, but failing to consider connection timeouts and failures (blocking connect). Lack of data didn't affect the service, but lack of connectivity did.
- (Amber H) We have existing VMs, is that an option?
  - (Josh B) It is. There are challenges with client behaviors using HA Redis and handling failover. VMs aren't my first choice.
  - (Gavin R) We should focus on figuring out a workable k8s solution first. The underlying storage solution is in essence the same thing.
  - (Dave S) Aioredis doesn't have direct support for sharding or replication (needs an exception handling wrapper). Simliar with TRedis.
  - (Gavin R) A fork of redis implements transparent clustering, which directs the client to the correct instance. (https://keydb.dev/)
  - (Josh B) Also has multi-master
  - (Dave S) The trouble is writes result in a denial response, leaving it to the client to find the correct instance it can write to
  - (Josh B) HAProxy could be told to talk the Redis protocol to find the primary and send traffic there for clients that can't handle the read-only response well.
- (Dave S) In k8s even when running as a cache, Redis will occasionally (4-5 times past year) fail to write to disk when there's some confusion between CEPH and the underlying mount
  - (Josh B) This was a lower-level CEPH issue; a race condition based on our specific configuration.
- I will take time to put a Redis pattern together
Ihar H
- Recently got feedback updating one of the service pipelines to meet standards, want to clarify expectations. We'd decided to split the stages
  - (Gavin R) I believe you can accomplish what you want to accomplish without creating so many explicit stages
  - (Ihar H) You have to click on a pipeline stage to see whether one of the nested pipeline tasks failed, and further stages could be run without the prior one passing
    - (Amber H / Gavin R) We do need to sometimes get deploys out even when acceptance tests fail
  - (Ihar H) Even if code is deployed to production, the pipeline will be marked as failed in some cases
  - (Gavin R) Your proposal is fine, we need to clarify best practices documentation to cover all cases.
  - (Dave S) Gitlab visualization has changed over time, we can see if anything can be improved to make things visually clearer
  - (Gavin R) Since we are using immutable image building, there's little value in running unit and integration tests in staging
    - (Dave S) … they shouldn't
  - (Gavin R) The prior method was tying stages to environments, which we don't want to do, they are different things.
  - (Ihar H) I will update the confluence page and follow up with Dave S
  - (Amber H) I want to see more of the why in that document
Gavin R
- Update on experiences with psycopg3, fixed the bug I found, I've been using it pretty heavily. Seems to work well. API is backwards compatible and has also evolved. Idiomatic row factory usage is different. One of the cool things is if you use dataclasses, you can use a dataclass row factory.
- (Amber H) We just need them and Pydantic to have a baby
- (Gavin R) That probably wouldn't be too hard to do
- (Andrew R) I really don't like what has to be done to pass a value to the IN keyword. In Psycopg, you have to format the comma-separated value yourself.
  - (Gavin R) You can specify an array
  - (Andrew R) That works with = ANY (...), but is much less performant than using IN.
  - (Alex C) A newer version of Postgres may fix the performance issue
Eric T
David R
- Been doing a lot of front-end experimentation with the concept of module federation to power out micro-frontend architecture for our react clients to build an app shell to compose them into a single MFA. We should discuss soon what it'd look like to deploy something like that. Grossly simplifying, every MFE exposes as part of its build a remoteEntry.js file that the shell application needs to be told as part of its configuration where that lives so it can load the module when necessary. The route willi then use the remote to on-demand load the module and its dependencies to load that application. I believe we just need that remote to be built and available as part of our deployment strategy. Will want to get the POC I've been working on into staging to start getting it working outside of local development.
  - (Gavin R) This is the effort to replace Sites as our application shell? (Yes)
    - https://confluence.aweber.io/display/BETL/Decommissioning+Sites
    - (Gavin R) The larger implication to backend is sessions and sites going away and the front-end application using the Public API going forward.
    - (Scott M) Something will need to fetch AWVars data to provide to the applications, from the session service for now, as a dynamic service hosting the shell application?
    - (David R) It would just be static content / JavaScript, provided it is able to discover the contained applications.
    - (Gavin R) JS should be on aweber-static, the shell application should be hosted on something TBD, and we'll need to bridge the transition between using sessions for stored data to something else.
    - (David R) I see the session service as a middle state (stepping stone) to work towards no longer needing it.
    - (Scott M) We should look into how it can get direct access to the session service
      - (Gavin R) I don't think we want that to be a dependency. We need a global state managed in a better way
      - (Dave S) We should start putting thought into how we're using local storage, etc. in a consistent way so data can be cached and not re-fetched, avoiding having each service have its own data model.
      - (Josh B) The shell app could provide a common interface to access shared cache data.
  - (David R) Let's make sure there isn't an assumption that the scout file and the remote entry file are the same thing.
Dave S
- Had something, will post it; Updating docs on aweber API and what's behind it.
  - Updated endpoint map on https://confluence.aweber.io/display/STD/api.aweber.com+Endpoint+Management
- If anybody really understands CORS, we need someone in the company that does. We're having CORS failures with companies like FaceBook, etc. If anyone has experience, please speak up in BoF or elsewhere.
Cedric W
- Alex and I drafted an ACP for bulk actions, will post a link for review. Will probably be more after tomorrow's meeting.
  - https://confluence.aweber.io/display/AR/Bulk-Action+Consumers+ACP
Arnela M
Andrew R
- Pydantic's awesome!
Amber H
- Working with the analytics ingestion service, it's pretty awesome for opens and clicks reporting. Currently in the development staging
- Talking with Scott the other day about how to do some cross-team code reviews and wanted to float the idea here on having other people on other teams to commit to some time during a sprint to review code from other teams.
Alex C
Correl R
- Perl stinks. Derefencing data structures from scalars with weird symbols stinks.

8.1 KiB Raw Blame History

2021-11-04

Backend BOF

8.1 KiB

Raw Blame History