Local, OpenAI-developed AI models now available at par: Premium offerings, run offline without dependence on cloud services
OpenAI, the leading artificial intelligence research laboratory, has open-sourced two new language models under the Apache 2.0 licensing: the gpt-oss-120b and gpt-oss-20b. These models showcase impressive capabilities in reasoning tasks and tool use, setting a new standard in the field.
Capabilities
The gpt-oss-120b and gpt-oss-20b models boast strong instruction following, chain-of-thought reasoning, tool use, and structured outputs. They are compatible with OpenAI’s Responses API and popular inference frameworks like Transformers, vLLM, Llama.cpp, and Ollama.
The gpt-oss-120b offers superior reasoning, a deeper attention capacity, and better performance on complex tasks. On the other hand, the gpt-oss-20b is optimised for speed and accessibility, making it suitable for low-cost or on-device inference. Both models utilise the Mixture-of-Experts (MoE) architecture, which reduces active parameters per forward pass to improve efficiency.
Requirements for Running
The gpt-oss-20b requires approximately 16 GB VRAM, while the gpt-oss-120b necessitates an 80 GB VRAM for inference. Here's a breakdown of their specifications:
| Model | Total Parameters | Active Parameters per Token | Memory Requirement | Recommended Hardware | Notes | |----------------|------------------|---------------------------|--------------------|----------------------------------------------|--------------------------------------------| | gpt-oss-20b | ~21B | ~3.6B | 16 GB VRAM | Single 16 GB GPU (e.g., consumer-grade GPU) | Ideal for on-device or low-cost server inference[2][3] | | gpt-oss-120b | ~117B | ~5.1B | 80 GB VRAM | Single high-end GPU such as NVIDIA H100 or multi-GPU setup | Can run on a single 80GB H100; recommended to use vLLM for best performance[1][2][3] |
The training of gpt-oss-120b took approximately 2.1 million H100 GPU hours, while gpt-oss-20b required roughly 10 times less[1]. However, the 120B model is large and loading can be slow due to SSD speed limitations[5].
Safety Precautions
Since these models are open-weight, developers must implement their own safeguards to prevent attackers from fine-tuning them to bypass safety restrictions[1][2][3].
Summary
The gpt-oss-20b, designed for consumer GPUs with 16GB VRAM or edge devices, offers a more accessible option for inference. In contrast, the gpt-oss-120b, requiring an 80GB GPU like NVIDIA H100 or a multi-GPU setup, delivers a higher reasoning capacity.
Both models are available on Hugginface and support adjustable reasoning effort levels (low, medium, high). They have undergone evaluation by independent expert groups, demonstrating near-parity with OpenAI’s o4-mini on reasoning benchmarks.
[1] OpenAI Blog: Link [2] GitHub: Link [3] Apache 2.0 License: Link [4] Codeforces: Link [5] AMD Ryzen™ AI and Radeon GPU Systems: Link
- For developers interested in integrating the gpt-oss-20b language model into their projects, it's important to note that it uses artificial intelligence technology and can be fine-tuned using tokens from both science and finance domains, making it suitable for various applications within the field of technology.
- In the realm of cryptocurrency and blockchain, the gpt-oss-120b model could potentially assist in the creation and promotion of Initial Coin Offerings (ICO) by generating compelling and informative proposals, leveraging the model's advanced capabilities in tool use and chain-of-thought reasoning.
- Asynchronous learning in artificial intelligence and finance can greatly benefit from the gpt-oss models' ability to learn and adapt through structured outputs, as they can be utilised in training algorithms to improve Ethereum (ETH) trading strategies based on real-world data, thereby boosting the efficiency of the financial market.