# PERFORMANCE EVALUATION OF DEVELOPED GEM-BASED X-RAY DIAGNOSTIC SYSTEM\*

PAWEŁ LINCZUK<sup>a,b</sup>, RAFAŁ D. KRAWCZYK<sup>b</sup>, ANDRZEJ WOJEŃSKI<sup>b</sup>
WOJCIECH ZABOŁOTNY<sup>b</sup>, KRZYSZTOF POŹNIAK<sup>b</sup>
MARYNA CHERNYSHOVA<sup>a</sup>, TOMASZ CZARSKI<sup>a</sup>
PIOTR KOLASIŃSKI<sup>b</sup>, GRZEGORZ KASPROWICZ<sup>b</sup>, MICHAŁ GĄSKA<sup>b</sup>
EWA KOWALSKA-STRZECIWILK<sup>a</sup>, KAROL MALINOWSKI<sup>a</sup>

<sup>a</sup>Institute of Plasma Physics and Laser Microfusion 23 Hery, 01-497 Warszawa, Poland <sup>b</sup>Institute of Electronic Systems, Warsaw University of Technology 15/19 Nowowiejska, 00-665 Warszawa, Poland

(Received July 23, 2018)

The volume of data acquired from Gas Electron Multiplier (GEM) detectors increases with emerging demands for soft X-ray measurement of hot plasma. In order to reach the expectations of the high-quality measurement, construction of high-throughput and low latency processing system is required. Overview and details of the current state-of-the-art of the developed system will be presented. The prepared solution consists of dedicated acquisition hardware, FPGA preprocessing and High Performance Computing devices used for numerical processing. Providing low-latency data transmission is based on PCI Express technology together with the dedicated Linux driver.

 ${\rm DOI:} 10.5506 / {\rm APhysPolBSupp.} 11.637$ 

### 1. Introduction

Processing of Soft X-Ray (SXR) measurement data, especially of high flux streams, is a challenging task. Parallel acquisition and processing methods have to be used to fulfill many requirements regarding data quality, processing time and available throughput. The best understanding of studied hot plasma properties can require raw data acquisition which may result in transmission problems and induce additional problems on latter parts of storage and processing.

<sup>\*</sup> Presented at the II NICA Days 2017 Conference associated with the II Slow Control Warsaw 2017, Warsaw, Poland, November 6–10, 2017.

The developed system addresses measurements related to hot plasma diagnostic of plasma impurities radiation. The detection is conducted by Gas Electron Multiplier (GEM) detector aimed to measure SXR in the range of 0.1–20 keV [1]. In the developed system, the further research is done to utilize High-Performance Computing (HPC) devices [2, 3] to handle such problems. This work shows possibilities of that novel system regarding the acquisition speed to be used later in the elaboration of physics conclusions.

# 2. System overview

From the communication point of view, the system can be divided into three main parts. The first is the detection part, consisting of the GEM detector. The current configuration is prepared to work with 128 channels. The hardware is planned to be versatile and scalable, so the number of channels may be increased or reduced, depending on the actual needs.

The raw data produced by that part still requires further preprocessing, which is performed by the second part. It uses FPGA for events (sets of samples related to a single trigger occurrence) detection and signal conditioning. The preprocessing part consists of two XC7A200T Xilinx Artix7 FPGAs and the computational part contains one Intel Xeon E5-2630 v3 (8 cores, 16 threads).

To transfer the data between the FPGA and the CPU memory, the PCIe bus is used. The Artix-7 device used during tests is capable of transmitting data via PCIe Gen 2 4x bus. Connection was done with the use of PCIe Switch 8-to-1, installed at Intel's S2600CW mainboard with Centos 6.7 system, vanilla 2.6.32 kernel (the newer kernels were tested too [4]) and 64 GB of DDR4 memory.

More details of the hardware part of the system, including system structure, hardware and FPGA firmware design can be found in [5, 6]. A discussion about the variety of considered devices for the computational part of the system is presented in [2].

The latest work [7] presents the detailed implementation of fast communication based on DMA. It consisted of the dedicated driver, based on the XDMA Linux kernel driver [8]. The maximum latency and the achievable throughput for 128 kB–4 MB chunks of data with the usage of this solution were measured [4] and found to be 200  $\mu$ s and 1.65 GB/s, accordingly, per FPGA. The solution used in tests, on a single event, transfers in a raw format all the data gathered from all channels.

#### 3. Test description

To measure the system capabilities, dedicated software was implemented. Two tests were performed. The first was to check the maximal throughput of the system with data downloaded from all the channels simultaneously. To do that, the noise acquisition was performed, so that the measurement system gathered as much data as possible. The difference between the FPGA-calculated timestamps included in the data was checked to find the maximal throughput per second.

The second test was to check the dead time of the system and was based on the data produced by the first one. The FPGA adds the timestamp to each event at the time of triggering. By calculating the difference between the timestamp and the clock cycle of samples acquisition (40 clock cycles) the dead time can be calculated.

### 4. Test results

The first test results showed that during the measurement, the time difference between the events was 328 clock cycles. The amount of the data handled in 1 s time could be calculated as follows:

$$\lfloor 1 \text{ s/(clock cycle period} \times \text{time difference between events} \rfloor = \\ \lfloor 109 \text{ ns/}(12.5 \text{ ns} \times 328) \rfloor = 243 \text{ 902 events}.$$
(1)

The above corresponds approximately to a stream of  $19.1 \,\mathrm{Gbit/s}$  or  $2.38 \,\mathrm{GB/s}$  of data. Assuming the exemplary storage capacity of 32 GB, the maximum time of measurement could be calculated for the case when the computational part is not processing the data. The resulting time is  $13.45 \,\mathrm{s}$ .

Considering the difference of timestamps between the events found in the first test, the dead time can be calculated as follows:

$$(328-40) \times \text{clock tick period} = 288 \times 12.5 \,\text{ns} \approx 3.6 \,\mu\text{s}$$
. (2)

This is the actual value and it is affected by the current configuration of the AXI bus in the FPGA. Adjustments can be done by decreasing the size of a single event, transmitting the data from meaningful channels only, and reducing the dead time. This should allow to increase the allowed event rate if more throughput is needed.

# 5. Conclusions

Within this work the technical capabilities of the system were measured. The current volume of data with the related data stream rate and dead time was calculated. The developed system is capable of continuous measurement, which is limited only by the amount of available RAM memory and CPU-side numerical computations. The achievable throughput (2.38 GB/s, 243 902 events/s, 13.45 s of continuous measure with 3.27 million of raw events data storage) is sufficient for initial plasma measurements.

The proposed solution is promising due to the possibility of raw data acquisition. That can provide more information about the measured phenomena than similar systems producing only the processed data on the output. The currently developed system is the versatile solution that can be easily (with the change of the analog acquisition part) adapted to solve other complex measurement problems.

This work has been carried out within the framework of the EUROfusion Consortium and has received funding from the Euratom research and training programme 2014–2018 under grant agreement No. 633053. The views and opinions expressed herein do not necessarily reflect those of the European Commission. This scientific work was partly supported by the Polish Ministry of Science and Higher Education within the framework of the scientific financial resources in the year 2017 allocated for the realization of the international co-financed project.

## REFERENCES

- [1] D. Mazon *et al.*, *JINST* **11**, C08006 (2016).
- [2] R. Krawczyk et al., Acta Phys. Pol. B Proc. Suppl. 9, 257 (2016).
- [3] P. Linczuk et al., Int. J. Electron. Telecommun. 63, 323 (2017).
- [4] P. Linczuk *et al.*, Evaluation of FPGA to PC Feedback Loop, in: Proc. SPIE 10445, Photonics Applications in Astronomy, Communications, Industry, and High Energy Physics Experiments 2017, p. 104454B, DOI: 10.1117/12.2280947.
- [5] A. Wojenski et al., Fusion Eng. Des. 123, 727 (2017).
- [6] A. Wojenski et al., JINST 11, C11035 (2016).
- [7] W.M. Zabołotny, DMA Implementations for FPGA-based Data Acquisition Systems, in: Proc. SPIE 10445, Photonics Applications in Astronomy, Communications, Industry, and High Energy Physics Experiments 2017, p. 1044548, DOI: 10.1117/12.2280937.
- [8] DMA for PCI Express (PCIe) Subsystem, https: //www.xilinx.com/products/intellectual-property/pcie-dma.html