Skip to main content
Launching Soon
Pyxis Logo

Pure Python.
Pure Speed.

The first LLM inference engine written 100% in Python + Triton. No C++. No CUDA bloat. Just hackable performance.

Join 500+ developers awaiting early access.

A Glimpse of the Future

What to expect when Pyxis launches.

Coming Soon

Blazing Fast

Optimized inference pipelines with minimal overhead. Pyxis delivers sub-millisecond latency for production workloads.

Easy Integration

Simple API that works with your existing ML workflow. Load models from PyTorch, TensorFlow, or ONNX with just a few lines of code.

Production Ready

Built-in metrics, health checks, and observability. Deploy with confidence using battle-tested infrastructure patterns.

Multi-Framework

Native support for PyTorch, TensorFlow, and ONNX models. Switch between frameworks without changing your inference code.

GPU Acceleration

Seamless GPU acceleration with automatic memory management. Scale from CPU to multi-GPU with a single configuration change.

Batch Processing

Intelligent batching and request queuing for maximum throughput. Process millions of predictions efficiently.

Frequently Asked Questions

Pyxis is built 100% in Python and Triton, removing the need for massive C++ binaries and CUDA bloat while maintaining sub-millisecond latency. It's truly hackable inference.

We are currently in private alpha. By joining the waitlist, you'll be among the first to get access when we open up the beta very soon.

At launch, Pyxis will support popular models like Llama 3, Mistral, and Qwen natively natively, with an easy API to add your own custom architectures.