Wed, 20 May 2026

Skymizer Taiwan Inc. unveils new AI inference architecture for ultra LLMs

Photo by Google DeepMind: https://www.pexels.com/photo/an-artist-s-illustration-of-artificial-intelligence-ai-this-image-was-inspired-by-neural-networks-used-in-deep-learning-it-was-created-by-novoto-studio-as-part-of-the-visualising-ai-pr-17483874/

Skymizer Taiwan Inc. has unveiled a new AI inference architecture designed to run ultra-large language models (LLMs) on a single card. As enterprises seek alternatives to cloud-based AI inference amid rising costs and growing concerns about data sovereignty, the solution aims to reduce their reliance on large GPU clusters significantly.

“The era of needing superscalar GPU clusters for ultra-large LLMs is over. HyperThought shifts AI from hyperscaler-only complexity to single-card simplicity for every enterprise,” William Wei, chief marketing officer, Skymizer, said.

AI inference architecture

The company showcased its HTX301 inference chip, a key component built on Skymizer’s HyperThoughtplatform. A single PCIe card with six HTX301 chips and 384GB of memory can locally run inference on models with up to 700 billion parameters, consuming approximately 240 watts.

The architecture aims to enable enterprises to deploy AI workloads on-premises, enhancing operational control and data privacy, while reducing dependence on large GPU clusters.

Designed for scalability, HyperThought supports deployment of AI models ranging from 4 billion to 700 billion parameters across various form factors.

Particularly suited for agentic AI workflows, HTX301 enables applications across financial services, healthcare, manufacturing, legal services, government and defence, retail, software engineering, and semiconductor design.

The platform uses a prefill/decode (P/D) disaggregation strategy, separating the compute-intensive prompt-processing phase from the memory-intensive token-generation phase. This allows existing GPUs to handle prefill tasks while HTX301 cards optimise decode-heavy inference.

“Purpose-built decode hardware paired with an intelligent software stack that orchestrates every inference workload — that’s how you disaggregate P/D at scale,” said Luba Tang, chief technology officer, Skymizer.

Skymizer Taiwan Inc. unveils new AI inference architecture for ultra LLMs

AI inference architecture

Recent Stories

Agoda simplifies trip planning with multi-product booking in a single seamless transaction

DigiCert launches new AI Trust architecture for securing AI agents, models, and content

Duck Creek unveils insurance-native agentic AI platform, applications for underwriting and claims

Related Stories

MORE STORIES

Management Leadership

Digital transformation in Indonesian healthcare

Genesys expands WhatsApp integration with Meta to unify customer engagement

Technology

Agoda simplifies trip planning with multi-product booking in a single seamless transaction

DigiCert launches new AI Trust architecture for securing AI agents, models, and content

ServiceNow and Lenovo partner to enhance productivity, governance, and reduce costs with AI-native operations

Industry Verticals

AVPN expands AI Opportunity Fund to scale AI literacy for educators and youth across APAC

Singapore fintech leaders to discuss governed AI adoption at executive roundtable

ALE enhances Rainbow Hospitality with AI for enhanced guest experiences