squeezlabs.github.io web signal

Squeez Labs' CrankGPT runs local AI on hand-crank power

TL;DR

  • CrankGPT runs a full voice-interactive AI pipeline on a Raspberry Pi 5 with 8GB RAM, powered solely by a 20W hand-crank generator.
  • Cold-start to functional conversation takes roughly 30 seconds; time to first token ranges from 0.8 to 2.9 seconds depending on model size.
  • Memory bandwidth, not raw compute, is the primary bottleneck for on-device LLM inference, with DDR5 hardware achieving 29-58% faster token generation than DDR4.

The premise of CrankGPT is almost absurdly literal: you turn a hand crank, speak a question, and a small language model answers you, with no wall socket, no Wi-Fi, and no data center involved. Squeez Labs built this as a working device, not a concept sketch. According to the project page, it has generated images, created poetry, and written code, all through locally-processed AI running on a single-board computer.

The hardware stack is deliberately modest. A 20W hand-crank generator powers a Raspberry Pi 5 with 8GB RAM, which runs speech recognition, language model inference, and text-to-speech entirely on CPU with no accelerators. The Pi's power draw climbs to around 15W during inference, with current spikes reaching 5A, which the generator cannot deliver smoothly, so the team built a custom capacitor board to buffer the output and provide roughly 20 seconds of power reserve during intensive processing. The complete parts list beyond the Raspberry Pi runs to approximately $100, with the board itself adding around $200.

Software choices reflect the same constraint-first logic. Moonshine handles speech recognition, described as "by far the fastest option for CPU-based ASR," feeding into language models from Liquid AI's LFM2 line (350M and 1.2B parameters) and Gemma 3 (1B), all running via llama.cpp. Time to first token ranges from about 0.8 seconds for the LFM2 350M model to about 2.9 seconds for Gemma 3 1B. Cold start from cranking to functional conversation takes around 30 seconds total. One finding worth noting for anyone choosing edge hardware: memory bandwidth, not raw compute, is the primary constraint on token generation speed, and an Orange Pi 5 Pro with DDR5 RAM achieved 29 to 58 percent faster output than the Raspberry Pi 5's DDR4.

The crank creates something no server rack offers: tactile feedback. When the language model runs inference, the crank gets harder to turn, giving the user a direct physical sense of computational load.

The honest caveat is that this is a builder's project, not a product. Squeez Labs is not manufacturing CrankGPT for sale, and the reporting does not confirm whether the custom capacitor board schematics or the voice agent code are fully open-sourced, which matters for anyone wanting to replicate it. The 350M-to-1.2B parameter models running here are also a long way from hosted alternatives in response quality, and the project page documents speed metrics but not answer accuracy. What it does prove is narrower but genuinely useful: a complete voice AI loop can run on cheap commodity hardware, powered entirely by a person, indefinitely.

Shared on Bluesky by 7 AI experts (top 5 by trust)