All about technology. — All about artificial intelligence.

High-speed AI functionality through collaboration: A look at OpenAI and NVIDIA's joint efforts

AI colossus gpt-oss, a 1.5 million tokens-per-second processing model, debuted from collaboration between OpenAI and NVIDIA - dynamics of speed and efficiency.

, and Administrator

2025 August 11 . 5:31 AM

2 min read

High-Speed AI Functioning Revealed: Collaboration Between OpenAI and NVIDIA

High-speed AI functionality through collaboration: A look at OpenAI and NVIDIA's joint efforts

In a significant stride towards advancing artificial intelligence, OpenAI has introduced its gpt-oss models, designed for local deployment on various hardware platforms. These models, gpt-oss-20b and gpt-oss-120b, offer impressive capabilities in language processing, but they come with specific hardware requirements.

Starting with the gpt-oss-20b model, it necessitates a minimum of 16 GB of VRAM or unified memory. This model is suitable for higher-end consumer GPUs or Apple Silicon Macs with sufficient memory. While it can run locally on PCs, Macs, or edge devices with around 16 GB RAM/VRAM, performance improves with more memory. User reports suggest that 16 GB RAM is the minimum for starting with gpt-oss-20b on Macs, but more memory is better for smoother use[3].

On the other hand, the gpt-oss-120b model demands a significantly larger memory capacity, with at least 60 GB of VRAM or unified memory. This model is designed to efficiently run on a single 80 GB GPU such as NVIDIA H100. For AMD Ryzen AI and Radeon GPUs, 61 GB VRAM is required, with specific driver requirements to ensure performance and stability[2].

Both models use MXFP4 quantization out of the box. The gpt-oss-20b model can run on 16 GB edge devices, and in numerous benchmarks, gpt-oss-120b performed at least on par with OpenAI's o4-mini model.

OpenAI has also developed a special protocol called "Worst-Case Fine-Tuning" for security testing. The models support context inputs up to 128,000 tokens long and are compatible with various frameworks like FlashInfer, Hugging Face, llama.cpp, Ollama, and vLLM. Microsoft allows local use on Windows devices via ONNX Runtime.

Notably, the gpt-oss-120b model requires only an 80-GB GPU for operation. Attention is implemented through a mixed, sparse schema. NVIDIA's TensorRT-LLM optimization stack is also supported.

OpenAI and NVIDIA collaborate with various platforms and hardware providers to ensure the models' compatibility and performance across a wide range of devices. Initial pilot projects with partners like AI Sweden and Snowflake are exploring on-site deployment of these models.

In summary, the gpt-oss models offer powerful language processing capabilities, but they require substantial hardware resources. The gpt-oss-20b requires around 16 GB VRAM/unified memory, while the gpt-oss-120b needs at least 60 GB VRAM/unified memory, recommending high-end consumer GPUs (for 20b) and beefy multi-GPU or large single-GPU setups (for 120b).

Energy consumption is a crucial factor to consider when deploying the gpt-oss models, especially the gpt-oss-120b, which demands a significant amount of power due to its large memory requirements. The advanced artificial-intelligence capabilities of these models, particularly in language processing, can lead to substantial energy usage, especially when running on high-performance hardware technology like large single-GPUs or multi-GPU setups.

Latest

High-traffic areas navigated with tech assistance by this Historically Black College and University...

All about technology.

High-traffic areas at an HBCU (Historically Black College or University) are being navigated more easily for individuals with disabilities due to the implementation of new technology.

Autonomous wheelchair technology, designed for individuals with disabilities, is being introduced by Morgan State University, aiding in smooth navigation through crowded spaces.

, and Administrator

2025 August 13

Designing and Analysis of PCB Layouts, Model Extraction, and Simulations carried out by Matthew...

All about technology.

Design and Analysis of Printed Circuit Boards, Component Extraction, and Simulation by Matthew Harms

Design expert Matthew Harms from EMA Design Automation discusses intricate topics like PCB layout, model extractions, and simulations. Catch his insights in the posted video.

, and Administrator

2025 August 13

Innovative Approach to Artistic and Intellectual Production through Computers and Algorithms

All about technology.

Artificial Intelligence Creation and Innovation

The shift in architectural tools from conventional to contemporary computational methods primarily aims to enhance efficiency and productivity. Many of these computational tools in the architect's toolkit originate from industrial sectors, including automobile and manufacturing. For instance,...

, and Administrator

2025 August 13

Essential Directives for Manipulating Python Dictionaries

All about technology.

Utilizing Python Dictionaries: Fundamental Instructions for Effective Usage

A Python Dictionary serves to hold key-value pairs within a variable. It belongs to the four built-in data structures in Python, with the others being tuples, sets, and lists. Key aspects of a Python Dictionary include its ability to store and retrieve data flexibly using keys, and since Python...

, and Administrator

2025 August 13

High-speed AI functionality through collaboration: A look at OpenAI and NVIDIA's joint efforts

High-speed AI functionality through collaboration: A look at OpenAI and NVIDIA's joint efforts

Read also:

Related

Latest