NVIDIA introduces DAQIRI to analyze CERN collision data in real time

30 Jun

The system moves high-speed detector data directly to GPUs, allowing scientists to examine more information before it is filtered out or sent for storage.

***NVIDIA DAQIRI connects high-speed scientific instruments and sensors directly with GPU-powered computing systems for real-time data processing. Image credit: NVIDIA***

NVIDIA has introduced DAQIRI, a high-speed data system designed to process information from scientific instruments as it is created, with researchers from CERN openlab, UCL, and the University of Chicago using it in a project focused on Large Hadron Collider data.

DAQIRI, which stands for Data Acquisition for Integrated Real-time Instruments, connects detectors and sensors directly to graphics processing units, or GPUs, so data can be filtered, compressed, analyzed by AI, or used to trigger an immediate response without first being stored elsewhere.

One of its first research applications is A-GHOST, a collaboration examining data produced by the ATLAS detector at CERN, the European Organization for Nuclear Research. ATLAS records the results of particle collisions inside the Large Hadron Collider.

The project is preparing for the High-Luminosity Large Hadron Collider upgrade, which is expected to produce substantially more collision data. Researchers plan to test several types of AI model on prototype hardware connected directly to the incoming detector stream.

NVIDIA has published a DAQIRI software repository and technical documentation, but the announcement does not provide results from live use at the upgraded collider or a timetable for moving the A-GHOST prototype into regular operation.

DAQIRI processes data as experiments run

Large scientific instruments can create information faster than researchers can store and examine it. The usual process involves collecting data, saving it, and analyzing it later. That model becomes harder to maintain when detectors, scanners, and sensors are producing extremely large amounts of information every second.

DAQIRI is designed to move some of that analysis closer to the instrument itself. This approach, often called edge computing, means the data can be processed near the point where it is generated rather than being sent elsewhere before any decisions are made.

NVIDIA says DAQIRI can support activities including filtering data, selecting important events, compressing files, running AI models, and adjusting an experiment in response to changing conditions.

It forms part of the NVIDIA Holoscan Platform and can connect with NVIDIA software including TensorRT for running AI models and nvCOMP for data compression. DAQIRI can also send data to software developed specifically for an individual scientific instrument.

The system routes information from a network interface card directly into GPU memory, avoiding some of the usual operating-system processes that can introduce delays and require additional computing power.

NVIDIA says the architecture can handle data moving at hundreds of gigabits per second or more, depending on the hardware and configuration used. The announcement does not include independently verified performance results for the CERN project.

Developers can configure the data path using text-based YAML files and work with DAQIRI through C++ and Python interfaces. NVIDIA says this is intended to reduce the amount of specialist network programming required to connect instruments with GPU-based processing.

The system can be used with computing hardware ranging from smaller NVIDIA DGX Spark and IGX systems to larger server and rack-based infrastructure.

CERN project targets data normally rejected by ATLAS

The High-Luminosity Large Hadron Collider will increase the number of particle collisions produced for scientists to study.

ATLAS cannot permanently save every collision event. Its upgraded selection system is expected to accept data at one million events per second after its first stage and reduce this to as many as 10,000 events per second for storage after the second stage.

Even with that increased capacity, more than 99 percent of collision events will still be rejected by the online selection system.

The A-GHOST project is investigating whether AI can examine more of that discarded data stream before it disappears from the normal processing route.

The research team plans to connect the programmable electronic boards used to receive detector information with a GPU-based processing system. This would allow AI models to inspect the full stream in real time and identify unusual or potentially valuable patterns.

The models due to be tested include convolutional autoencoders, convolutional neural networks designed to work across sequences of data, and transformer-based systems. In practical terms, the researchers are testing different ways for AI to spot patterns or events that the existing selection process may not preserve.

Nikos Konstantinidis, Professor of Particle Physics and Data Intensive Science at UCL, wrote on LinkedIn that "this work paves the way towards revolutionising the depth at which we can investigate every collision event at the LHC, at 40 MHz".

A rate of 40 MHz means the system is dealing with activity occurring around 40 million times each second.

The A-GHOST work is an R&D project rather than a replacement for the established ATLAS data-selection system. The information provided does not show how accurately the proposed AI models will identify useful events, how many additional events could be retained, or what level of human review will be required.

Prototype testing will determine the next step

DAQIRI is intended for uses beyond particle physics. NVIDIA identifies possible applications involving photon research facilities, industrial CT scanners, and high-bandwidth radio systems, where instruments also produce large data streams that may need to be processed immediately.

The core proposition is that researchers should not always have to store everything before deciding what is important. DAQIRI allows them to analyze incoming data, keep selected information for deeper study, and potentially adjust an experiment while it is still running.

That could reduce the volume sent to major computing facilities, but it also places more responsibility on the software and AI models making decisions at the point of collection. If relevant data is filtered out incorrectly, it may not be available for later analysis.

Geraint Rees, Vice-Provost for Research, Innovation and Global Engagement at UCL, wrote on LinkedIn: "Great to see our UCL partnership with NVIDIA extending to this exciting cutting-edge collaboration at the very edge of what can be achieved in real-time data acquisition and processing…"

The CERN openlab, UCL, and University of Chicago team will next test the planned AI models with prototype hardware. NVIDIA has made DAQIRI code, configuration examples, and developer documentation available through GitHub, but no date has been provided for operational use within the High-Luminosity Large Hadron Collider.