lennxa

HN categorizer

Source | #projects, #tech

hnam

A friend only wanted to reads posts related to Biology on HN - more generally anything not tech related. So I had Sonnet-3.6 quickly whip up a site that can categorize posts based only on the title. The flow is:

  1. fetch top 500 posts (HN api)
  2. for each post item fetch the details (title) (HN api)
  3. categorize each post using MiniLM-L6
  4. display the post

The best part is that everything runs on the client! Transformers.js wipes the floor. It's hosted on netlify here and source can be found on here.

It's not even close to perfect, it messes up quite a bit in terms of categorization - but it's better than using searching 'biology' in algolia.

To improve,

  1. improve query strings to optimize for biology related, in case there's overlap I want a way to be able to map to multiple categories
  2. (complicated) run a backend that categorizes each item once using LLM and caches the result - shouldn't be difficult to implement

#links #projects #tech