Marco

Language and keyboard stuff at Google. I like computers and Korean and computers-and-Korean and high school CS education. Georgia Tech → 연세대학교 → 東京工業大学. Regrettably no longer based in Tokyo :/ https://theoreticallygoodwithcomputers.com/

Articles & links

The Maia3 model source code was released today! github.com/CSSLab/maia3

arxiv.org
View on Bluesky · ♥ 10 ↻ 1 ↩ 0 · 25d ago
Marco reposted
Marco @mcognetta.bsky.social

Wrote a blog post about the token encoding format in GPT tokenizers. Have you ever inspected a tokenizer and seen tokens like оже or ĠнÑĥжно? These are tokens in a custom encoding format used for serialization of byte-level tokenizers. This post shows how to recover the…

Decoding GPT-Style Tokens mcognetta.github.io View on Bluesky →

Recent commentary

The Claude Code use case that has saved me the most time so far is just "get this slightly out of date ML repo working on my workstation".

View on Bluesky · ♥ 29 ↻ 0 ↩ 1 · 25d ago

Requesting that everyone call me Dr. Parameter-Efficient Korean Character-Level Language Modeling for now on. I'm honestly so disappointed in spammers these days. No excuse in the age of LLMs.

View on Bluesky · ♥ 20 ↻ 2 ↩ 0 · 15d ago

One funny thing about ChatGPT is that when I ask it some technical question and it starts doing the "searching the web" and then flashing through websites, it always does some weird ones like imbd. Idk what you are expecting to find there brother but you do you.

View on Bluesky · ♥ 3 ↻ 0 ↩ 2 · 11d ago

In Marco's orbit

Center = Marco. Left = members they follow (green edges). Right = members who follow them (blue edges). Top = mutual follows (orange edges, slightly larger). Drag any node to reposition; click to open that profile.