v1.0 · Powered by llama.cpp

Open-source LLMs
in your pocket

Astra is an offline AI assistant for your phone. Built on the llama.cpp inference engine, ships with Qwen3.5 0.8b, and pulls major open-source models from Hugging Face and ModelScope in one tap.Local inference · Zero data upload · Works in airplane mode.

Download now Browse models

iOS

Android

macOS · Windows coming soon

Inferring offline

Qwen3.5 · 0.8B

tokens/s

Why Astra

Why choose Astra

An on-device LLM runtime designed for mobile — cloud-grade intelligence, returned to you.

Fully offline

Subway, basement, airplane mode — Astra never goes dark. Every inference runs locally, no network required.

Privacy first

Your chats never leave the device. No uploads, no logs, no cloud sync. Privacy is the default.

Powered by llama.cpp

A best-in-class open-source inference engine, deeply optimized for mobile CPU / GPU / NPU.

Two model hubs

Seamless integration with Hugging Face and ModelScope. Search → download → load, in one flow.

Ready out of the box

Ships with Qwen3.5 0.8B — chat the moment you install, no setup, no extra downloads.

All platforms

Android and iOS are live now; macOS and Windows clients are on the way.

Model Library

Popular models, one tap away

Deeply integrated with Hugging Face and ModelScope — search, install, done.

Browse all

Qwen

Qwen3.5-0.8B-Instruct

0.8B

Alibaba's lightweight Tongyi Qianwen instruct model — Astra's built-in default, excellent in Chinese.

GGUF Q4·ModelScope

Set sail for the stars

Free, no account required. Run your own LLM on your own phone.

Download Astra

Open-source LLMsin your pocket