Tanishq Mathew Abraham, Ph.D.

model: https://t.co/Hh1XvEzaWJ abs: https://t.co/T2YtoxMa34

Nemotron-TwoTower: Diffusion Language Modeling with Pretrained Autoregressive Context arxiv.org

NVIDIA's Nemotron-TwoTower splits an LM into a frozen autoregressive context tower and a trainable diffusion denoiser with bidirectional block attention.
The system is built on Nemotron-3-Nano-30B-A3B, a 30B hybrid Mamba-Transformer MoE backbone, and trained on roughly 2.1 trillion tokens.
The authors report retaining 98.7% of the autoregressive baseline's quality while delivering 2.42x higher wall-clock generation throughput.

View on Bluesky · ♥ 0 ↻ 0 ↩ 0 · 3 from the directory shared this · 25d ago

Great work to the Meta AI team! Best part of it is they have open-sourced the code and plan to open-source data too! So you should be able to train your own brain-to-text model, assuming you have your own MEG! 😄 code: https://t.co/XF9z4JCzzq

GitHub - facebookresearch/brain2qwerty: Non-invasive decoding of typed sentences from MEG and EEG brain recordings using a convolutional encoder, transformer, and character-level language model. github.com

AI Weekly's analysis →

Meta's FAIR lab released Brain2Qwerty v2, a non-invasive MEG-to-text pipeline reaching an average 61% word accuracy across nine volunteers.
The system was trained on roughly 22,000 sentences per participant recorded over 10 hours, with the top participant reaching 78% word accuracy.
The original Brain2Qwerty study, run with 35 volunteers, is being published in Nature Neuroscience with a v1 MEG character error rate of 32%.

Read full analysis →

View on Bluesky · ♥ 0 ↻ 0 ↩ 0 · 2 from the directory shared this · 21d ago

abs: https://t.co/S5GD4ecvTf

Autodata: An agentic data scientist to create high quality synthetic data arxiv.org

AI Weekly's analysis →

Meta researchers introduce Autodata, a method that casts an AI agent as a data scientist iteratively generating and refining synthetic training data.
The practical implementation is called Agentic Self-Instruct, and meta-optimizing the data scientist agent itself produced a larger uplift than static methods.
On legal reasoning tasks, a 4B parameter model trained on agent-made data reportedly beat a 397B parameter baseline.

Read full analysis →

View on Bluesky · ♥ 0 ↻ 0 ↩ 0 · 2 from the directory shared this · 26d ago

Articles & links