diff options
| author | Claude <noreply@anthropic.com> | 2026-01-12 08:58:04 +0000 |
|---|---|---|
| committer | Claude <noreply@anthropic.com> | 2026-01-12 08:58:04 +0000 |
| commit | 8524487824f7332223b24e75ab327bf6ec5eccc9 (patch) | |
| tree | c44fb10d82e34d55479cefc62496517c749db3b6 /backend/db/migrations | |
| parent | 485486c7ff986712ecb09e92217236d276d317c4 (diff) | |
| download | feedaka-8524487824f7332223b24e75ab327bf6ec5eccc9.tar.gz feedaka-8524487824f7332223b24e75ab327bf6ec5eccc9.tar.zst feedaka-8524487824f7332223b24e75ab327bf6ec5eccc9.zip | |
refactor: deduplicate articles at insertion time instead of query time
Change deduplication strategy from query-time (ROW_NUMBER window function)
to insertion-time (global guid check before insert).
Benefits:
- Simpler queries without CTE/window functions
- Consistent read state (no duplicate articles to manage)
- Better query performance (no per-query deduplication overhead)
Changes:
- Add CheckArticleExistsByGUID query for global guid lookup
- Add migration to remove existing duplicate articles
- Modify fetchOneFeed and AddFeed to skip duplicates on insert
- Revert GetUnreadArticles/GetReadArticles to simple queries
Diffstat (limited to 'backend/db/migrations')
| -rw-r--r-- | backend/db/migrations/005_add_guid_index.sql | 10 |
1 files changed, 9 insertions, 1 deletions
diff --git a/backend/db/migrations/005_add_guid_index.sql b/backend/db/migrations/005_add_guid_index.sql index a653d79..e3625ee 100644 --- a/backend/db/migrations/005_add_guid_index.sql +++ b/backend/db/migrations/005_add_guid_index.sql @@ -1,2 +1,10 @@ --- Add index on guid for deduplication queries +-- Add index on guid for deduplication CREATE INDEX IF NOT EXISTS idx_articles_guid ON articles(guid); + +-- Remove duplicate articles by guid, keeping only the one with the smallest id +DELETE FROM articles +WHERE id NOT IN ( + SELECT MIN(id) + FROM articles + GROUP BY guid +); |
