7.1 KiB
7.1 KiB
2022-03-29
Get list of subscribers with non-normalized tags on active accounts
Investigating the impact of the new Tag Normalization rules on existing subscribers on active accounts.
Gathering data
I imported a dump of the subscriber_tags
table from AppDB as well as the
list.subscribers
table data for all active accounts (~SELECT s.* FROM
list.subscribers s JOIN accounts a ON (a.a_id = s.account_id) WHERE a.status_id
< 7~)
I then built a table of subscribers having tags that do not match our validation rules.
CREATE TABLE invalid_tags AS
SELECT s.list_id, s.account_id, t.subscriber_id, tag
FROM subscribers s
JOIN subscriber_tags as t ON (s.id = t.subscriber_id)
, unnest(tags) tag
WHERE tag != normalize_tag(tag)
Active accounts
SELECT COUNT(DISTINCT account_id) FROM subscribers
count |
---|
103,357 |
Subscribers on active accounts
SELECT COUNT(id) FROM subscribers
count |
---|
259,745,858 |
Subscribers with invalid tags
SELECT COUNT(DISTINCT subscriber_id) FROM invalid_tags
count |
---|
1,331,220 |
"259,745,858"
Accounts with subscribers with invalid tags
SELECT COUNT(DISTINCT account_id) FROM invalid_tags;
count |
---|
3,220 |
Normalized tag breakdown
SELECT 'Non-printable characters' AS "Rule"
, COUNT(DISTINCT account_id) AS "Accounts"
, COUNT(subscriber_id) AS "Subscribers"
FROM invalid_tags
WHERE tag ~ '[^[:print:]]'
UNION SELECT 'Commas' AS "Rule"
, COUNT(DISTINCT account_id) AS "Accounts"
, COUNT(subscriber_id) AS "Subscribers"
FROM invalid_tags
WHERE tag ~ ','
UNION SELECT 'ASCII quotation marks' AS "Rule"
, COUNT(DISTINCT account_id) AS "Accounts"
, COUNT(subscriber_id) AS "Subscribers"
FROM invalid_tags
WHERE tag ~ '[''""]'
UNION SELECT 'Unicode quotation marks' AS "Rule"
, COUNT(DISTINCT account_id) AS "Accounts"
, COUNT(subscriber_id) AS "Subscribers"
FROM invalid_tags
WHERE tag ~ '[‘’“”]'
UNION SELECT 'Leading or trailing whitespace' AS "Rule"
, COUNT(DISTINCT account_id) AS "Accounts"
, COUNT(subscriber_id) AS "Subscribers"
FROM invalid_tags
WHERE TRIM(tag) != tag
UNION SELECT 'Repeated whitespace' AS "Rule"
, COUNT(DISTINCT account_id) AS "Accounts"
, COUNT(subscriber_id) AS "Subscribers"
FROM invalid_tags
WHERE TRIM(tag) ~ '[:space:]{2,}'
UNION SELECT 'Upper-case characters' AS "Rule"
, COUNT(DISTINCT account_id) AS "Accounts"
, COUNT(subscriber_id) AS "Subscribers"
FROM invalid_tags
WHERE LOWER(tag) != tag
Rule | Accounts | Subscribers |
---|---|---|
Leading or trailing whitespace | 119 | 66,788 |
Repeated whitespace | 2,404 | 1,234,651 |
Unicode quotation marks | 126 | 21,343 |
Commas | 378 | 54,567 |
ASCII quotation marks | 2,507 | 1,544,607 |
Upper-case characters | 0 | 0 |
Non-printable characters | 58 | 1,749 |