Page tree
Skip to end of metadata
Go to start of metadata

Deliverables


Draft document

Deliverable numberD4.1
Deliverable TitleReport on the current technical elements of data analysis at each partner site 
Lead BeneficiaryILL
Dissemination LevelPublic
Due date of deliveryMonth 12
ContributorsJamie Hall (ILL), Stuart Caunt (ILL), William Turner (ILL)

Premise

As part of the task, we distributed a survey to all PaNOSC partners (CERIC-ERIC, ILL, XFEL, ESS, ELI and ESRF) with the aim of building up a view of data analysis needs and services at each facility and to provide a basis for developing a set of requirements for the development of services for WP4. The partners from the EXPANDS project were also contacted and as such we also have results for the ISIS Neutron and Muon Facility. 

The survey was split into four sections:

Scientists and Data

Data generation, user-community and scientific nature of each facility for todays situation and forecast for 2023

Data analysis and reduction

Tools and service concerning data analysis and reduction

Technology

Determines the IT infrastructure available for data analysis purposes

Security

Provides a global view of the security requirements and solutions at each facility

Other information

Additional information that could be useful for the development of analysis services for PaNOSC.

The results give a good overview of the various tools and services currently in use at each facility. The report will broken down into:

  • Common tools and services
  • Tools that the facilities would like to explore in more depth
  • Emerging tools that would be beneficial to look into for developing services

Description of Work

Add description of work here

Survey Results

Scientists and Data

1. How many scientific visitors do you receive per year?

InstituteCurrent situationForecast 2023
CERIC-ERIC400500
ELI

The ELI Facilities are currently performing user-assisted commissioning experiments. They will gradually open peer-reviewed access to the experimental facilities, the expectation being to have steady-state operations in 2023.

In 2019, ELI Beamlines (ELI-BL) had a number of scientific visitors for both commissioning calls (not very relevant regarding data: E1/TERESA, anticipated E3 “first light”) and a pilot call (L1-E1) with ca 4-5 different groups. The number of external users at ELI-ALPS is of the same order.

ELI-ALPS: circa 300 visitors per year, being understood that groups behind these experiments include many more scientists than the 300 indicated here;

ELI-BL: 280 users
ESS0 (now)ESS is scheduled to start in 2023. Visitor numbers not known for now.
ESRF6548 (2018). Some users are coming several times. The estimation of unique individuals visiting ESRF in 2018 is about 3000.10,000
ILL2000 visitors last year.2200 visitors.
XFEL

Variable – 650+ (local) and 100+ (remote) in 2018, instrument commissioning is still ongoing, extrapolating from that estimates are 3,000+ users a year during steady operation.

-
ISIS

In 2018 ISIS had 2512 scientific visitors. There are more beamlines being built to fill out remaining available space within the target station 2 building, hence the number is likely to be higher than 2512 in 2023.

-


2. How many experiments are performed per year?

InstituteCurrent situationForecast 2023
CERIC-ERIC

There were 187 experiments approved in 2018, 16% increase in comparison to 2017 (CERIC annual report 2018).

Expected increase of 20% (220 experiments).
ELI

ELI-ALPS: 95 experiments

ELI-BL: 60 experiments
ESS

0 (now).

~100
ESRF

2018: 924 experiments.

1200
ILL600.700.
XFEL

Currently ~3,000 hours of beam time available a year (per experiment) so it depends on how long the experiments run for. So around 50 to 150 experiments a year.


ISIS

In 2018 ISIS had 1085 proposals from 32 countries and 264 Xpress proposals.



3. How much data is produced per year?

InstituteCurrent situationForecast 2023
CERIC-ERIC1 PB per year.Expected to be around 50 pb per year
ELI

ELI-ALPS: 50 TB total

ELI-BL: 400-500TB/year

10+ PB per year

ESS0 (now).< 500 TB
ESRF9 PB.

50 PB. The error bar on this number is large. It could be much more if our IT infrastructure is capable of dealing with it.

ILL250 TB.500 TB.
XFEL

This year around 10PB, estimated around 100PB once in full operation.


ISIS

About ~15 TB of raw data from April 2017 to April 2018. With the neutron image technique beamline in full operation and improvement in detector technology this will increase significantly. A guess is ~30 TB by 2023



4. What are the mean and maximum data file sizes for an experiment?

InstituteCurrent situationForecast 2023
CERIC-ERICMax: dozens of GB (Tomography experiment at the Syrmep beamline); Min: dozens of KB (experiments like Scanning Transmission X-ray Microscopy at the TwinMic beamline). Mean: dozens of MB.Max:  Dozens of GB ; Min: dozens of KB ; Mean: dozens of MB
ELI

ELI-ALPS:

- Min: 2.2 GB
- Max: 9 TB
- Mean: 1.04 TB

ELI-BL:

- Min: 2-4MB
- Max: 2GB


ESS

Currently undetermined.


ESRF

Data file size is typically from 1MB to 16MB.

Yet an experiment consists of many datasets, and a dataset consists of many files.

For instance, mean and range for the (few) already publicly available experimental data:

Mean: 500GB, Range: [2.4-3902] GB

But that can be much more.


ILLMax: 70TB - Mean: dozen of GbMax: 100TB - Mean: doen of GB
XFEL

Largest proposal is 856TB, mean 70TB, standard deviation 137TB, proposals are made up of runs (largest directory is 30TB, mean is 350GB, standard is 1TB), and runs are made up of multiple data files (12GB each).


ISIS

Range from hundreds KB to ~63 GB currently.


5. Do you have any open data (freely accessible by anyone)?

InstituteCurrent situationForecast 2023
CERIC-ERICNo, we don't have any open data available.We aim at having open data according to the implementation of the CERIC Data Policy under development through the collaboration of PaNOSC WP2 activities.
ELINoIt is something ELI-BL want to start as a pilot project in late 2020, but it depends on resource availability and prioritization.
ESSYes.
ESRF

Yes, see https://data.esrf.fr


ILL

Yes, see https://data.ill.eu

2019: 1300 datasets

The data policy has been in place since 2012. Data that is outisde of the embargo period is made available as Open Data.

2023: 3500 datasets

XFEL

Currently all data is under embargo, but some users have uploaded parts of their data to CXIDB where it can be downloaded by the public.


ISIS

Yes, see https://www.isis.stfc.ac.uk/Pages/Data-Policy.aspx


6. How long do you keep data for after an experiment?

InstituteCurrent situationForecast 2023
CERIC-ERICThe new scientific Data Policy states we will keep data for at most 10 years.Same as today.
ELITBD.As long as possible on a best effort basis(data policy to be approved in the course of 2020)
ESS

Current ESS Data Policy states at least 5 years.


ESRF

50 days on disk after the end of the experiment, one year on backup tapes, part of the data for 10 years in a tape archive.

30 days on disk after the end of the experiment, all data for 10 years in a tape archive.

ILLForever.Forever.
XFEL

Anticipated minimum of 5 years, aim for 10 years for key data.

-
ISISPerpetuity.-

7. How long is the embargo period for data produced at your facility?

InstituteCurrent situationForecast 2023
CERIC-ERICCERIC-ERIC Data Policy is under development and when it will be ready the embargo period will be 3 years.3 years..
ELINA3 years (TBC).
ESS

Current ESS Data Policy states 3 years.

-
ESRF

3 years, extendable upon request.

-
ILL3 years. Extended to 5 if no request has been made.-
XFEL

3 years..

-
ISIS

3 years.

-

8. What data formats are used at your facility (nexus, institute-specific, ...)?

InstituteCurrent situationForecast 2023
CERIC-ERIC

HDF5 based formats (including nexus), csv, ssv, txt, tiff.

We are working to make nexus format the standard data container for all instruments.
ELI

ELI-ALPS: data formats are experiment specific.

ELI-BL: no real strategy has been developed yet.

We see an interest in HDF5 and it is something we use (and like). It is highly unlikely we will develop our own data formats.
ESS

NEXUS.


ESRF

EDF, Specfile, HDF5, NeXus+HDF5, other data formats depending on detectors.

Majority of HDF5, NeXus+HDF5, and other data formats depending on detectors.

ILLNexus (for all instruments except nuclear physics and ROOT (for nuclear physics).-
XFEL

Institute-specific HDF5.


ISIS

Nexus, and for those instruments who are not yet using this format they use an ISIS binary .raw format.


9. Does your facility generate DOIs for your experimental data? If yes, when are they generated?

InstituteCurrent situationForecast 2023
CERIC-ERIC

DataCite has been chosen as our DOI provider. We are under testing phase with their API. We should start to generates DOIs by 2020.

We will generate DOIs for all experimental data through DataCite.
ELI

Not yet, but this is of course our intention.


ESS

During data acquisition, DOIs will be assigned as soon as experiment data is first written to disk.


ESRF

Yes. DOIs are generated at the beginning of the acquisition.

2018: Partially deployed on the beamlines.

Available for all beamlines.

ILLYes. After an experiment has been carried out.We are only minting DOIs for proposals. By 2023, we should have DOIs for subsets of data.
XFEL

Yes. Generated along with metadata catalogue entry, only made public once the embargo is lifted.


ISIS

Yes currently for all except Xpress experiments. They are generated before cycles start.


10. Does your facility have a User Portal (eg to submit proposals)?

InstituteCurrent situationForecast 2023
CERIC-ERIC

Yes. CERIC-ERIC uses the Virtual Unified Office (VUO) portal to manage the whole experiments life cycle.

By 2023 we hope to integrate our User Portal with the partners facilities of PaNOSC.
ELIThe ELI Facilities are currently using ad-hoc solutions for proposal submission in the framework of commissioning calls.A user portal will be developed in the coming years, with a first version anticipated in the second half of 2020 once ELI ERIC is established and launches its first call for peer-reviewed access.
ESS

Not yet, but this is planned to exist by 2023.

-
ESRF

Yes, https://smis.esrf.fr/.

-
ILLYes, https://userclub.ill.eu/userclub/.-
XFEL

Yes..

-
ISISYes.-

11. How would a user download all of their data at your facility?

InstituteCurrent situationForecast 2023
CERIC-ERIC

The VUO portal allows for remote data browsing and downloading when data is still under embargo period. After this period data can be download by contacting the IT group.

We will keep data downloading through VUO and will integrate it with the common portal under development in PaNOSC.
ELI

ELI-ALPS: FTP

ELI-BL: Catapult (https://www.catapultsoft.com/)

TBD (eg Globus).
ESS

Currently undetermined, but likely through a GridFTP-like solution or similar.


ESRF2018: Via the Internet at https://data.esrf.fr/ or through ICAT API for ~10% of the data. Portable media (disks) for ~90% of the data.

Hopefully more over the Internet.

ILL

Downloading using HTTP (https://data.ill.eu). SFTP transfer(dt.ill.eu).

Perhaps using OneData or Globus (undecided for the moment).
XFEL

In theory they wouldn’t. However if they want to then we offer Globus for large downloads, alternatively for a small subset of files rsync or sftp would be an option.

-
ISIS

Through the ISIS Data catalogue.

-

Data Analysis and Reduction

1. Do you provide Jupyter Notebook/JupyterLab services to visitors?

InstituteAnswer
CERIC-ERIC

Today: We have a prototype installation of JupyterHub which has been used for use cases testing. Forecast 2023: We plan to offer a production environment of the JupyterHub system (including sample Jupyter Notebooks, tutorials, GPUs) to visitors.

ELI

Not at present. ELI-ALPS have Jupyter notebook set up and internal staff is testing it. The plan, within the context of PaNOSC, is to integrate this technology to our offering and promote its use internally and to external users.

ESS

Jupyterhub service is currently in testing stage, expected to be production by end of 2020.

ESRF

Yes: some beamlines have some jupyterhub servers running and jupyterhub+kubernetes, binderhub and jupyterhub+slurm are also been made available recently.

ILLYes. We have recently deployed JupyterHub. It is currently in a pilot phase.
XFEL

Yes. JupyterHub is provided as a service to the users.

ISIS

No, not directly. But we provide machine with Python on where users could setup Jupyter Notebooks.

2. How many staff and visiting scientists/end-users currently use Jupyter Notebook/JupyterLab at your facility?

InstituteAnswer
CERIC-ERIC

Today: About 5 people from our staff are currently in use of such technologies. Forecast 2023:  We expect to increase those numbers as the infrastructure to offer such services in a production environment will be ready.

ELI

ELI-ALPS: 2 staff members

ELI-BL : 2 staff members

ESSSee previous answer.
ESRF

No idea but only a small number (10+) up to now, yet growing.

ILLNot many for the moment but hopefully we will see this number grow due to making notebooks available to ILL users.
XFEL

~150 during one week on the JupyterHub, plus an unaccounted number of users running their own Jupyter Notebook instances.

ISIS

This is not recorded.

3. Do external users currently have remote access to a data analysis/reduction services (remote desktops, grid computing, shell...)?

InstituteAnswer
CERIC-ERIC

Today: Yes. A remote data browsing and visualization web application allows users to access and visualize HDF5 files through the VUO portal. Remote shell access is also available through the same portal. Forecast 2023: We aim at providing Jupyter Notebook data analysis/reduction services.

ELI

Not at the moment.

ESS

External users of the cluster can access it via SSH.

ESRF

2018: Remote desktop provided through NoMachine (nx.esrf.fr), else connection using SSH.

ILLYes. They have access to remote desktops (VISA) externally and the HPC cluster.
XFEL

Yes. SSH and remote desktop access to registered users.

ISIS

Only for those instruments that are using the new- ISIS Data Analysis as a Service platform. By 2023 this will have increased.

4. Do you offer data analysis/reduction training at your facility?

InstituteAnswer
CERIC-ERIC

Today: We offer on site support for data analysis/reduction during the experiment but not a formal training. Forecast 2023: With the developments of PaNOSC we will be able to offer in an easier way local and remote training.

ELI

Some internal training on image processing has been initiated at ELI-BL and will repeat in the upcoming year. However, this is more intended to enable scientists to automate processes (alignment) without CS help.

ESS

The data analysis, reduction and modelling group offer training on specific software applications they develop.

ESRF

2018: We provide some Python trainings oriented towards data analysis (Training material: https://github.com/silx-kit/silx-training). 2023: Training material should be made available and possibly trainings should be complemented by tutoring services.

ILLNot specifically at ILL, but we propose training as part of the sw development collaboration.
XFEL

Occasional in person training for groups, one-to-one training on request, written documentation.

ISIS

Yes, through annual Neutron training school and Muon training school. Plus e-learning material, documentation of software used on beamlines and through training by instrument scientists.

5. Do your scientists support users with their data analysis/reduction after their experiments?

InstituteAnswer
CERIC-ERIC

Today: Yes. Most of the times the scientific computing team and the beamline responsible scientist offers support for the data analysis/reduction required by the users.

Forecast 2023: We will keep this support.

ELI

At ELI-BL, we already see that informally in E1 and partially with the commissioning experiments. There is a massive effort for reconstruction of experimental TOF data and scientists seem to be doing that collaboratively with their users.

ESS

We do not yet run experiments. We expect this to be the case in the future.

ESRF

2018: Yes, but not officially. This is done as an additional informal service by many of the local contacts on the beamlines. 2023: A clear mandate for DaaS should be in place and define the services provided by the ESRF.

ILLYes but not officially, this is often done by the local contact.
XFEL

Yes. Support and advice is provided for our tools and software.

ISIS

In some cases if required.

6. Are data analysis/reduction results preserved at your facility?

InstituteAnswer
CERIC-ERIC

In some cases, during the experiment the user has available a data analysis directory in which one can do  and preserve analysis results. Such data is also accessible through the VUO portal. Still, there is no common policy defined as of today.

ELI

Not at the moment, because data analysis is mostly handled by users and not yet integrated into a processing pipeline. However, we anticipate storing results of some of those reconstructions (especially at ELI-BL, where we will provide analysis as part of the DAQ chain for massive amounts of raw data that we might not be interested in saving).

ESS

We do not yet have a system set up for this. We expect that this will be done to some extent, at least for auto-pipelined data products.

ESRF

Yes for in-house experiments, because these storage areas are still covered by a backup. And no for external users, this is currently not part of the data policy. 2023: It may be necessary to archive also processed data, if economically feasible.

ILLYes. Analysis and reduction results are stored on the central storage along with the raw data,
XFEL

Yes. Processed/calibrated data is stored, calibrated data is not always stored but can be re-created on demand.

ISIS

Not in perpetuity currently.

7. How long do you keep the analysis results?

InstituteAnswer
CERIC-ERIC

Again, there is no data policy defined for this yet. Data is kept according to each beamline and research team need.

ELI

To be defined in future ELI ERIC data policy - we would consider and treat analysis results like user data in the data policy.

ESS

Currently undetermined.

ESRF

One year for in-house experiments.

ILLForever.
XFEL

6 months for calibrated data, user files are saved for 24 months.

ISIS

A policy for how long to keep analysed results created through the ISIS Data Analysis as a Service platform is currently being developed.

8. How much data is produced after data reduction / analysis?

InstituteAnswer
CERIC-ERIC

In some cases same amount as raw data.

ELI

The steady-state estimate for the overall amount of data archived at ELI is 10+ PB per year.

Significant reduction will be required for some high-repetition-rate experiments (for example, E1 at ELI-BL, where reduction is from 20-30Gb/s to 1Gb/s).  

ESS

Currently undetermined. For 2023, <500TB is expected.

ESRF

N/A, this highly depends on the kind of experiment/analysis.

ILLThis highly depends on the kind of experiment/analysis - it could represent more that the raw data.
XFEL

We do not currently have data reduction set up, volume of analysis results vary from a few KB to TB.

ISIS

It can varies a lot from instrument to instrument, but to a first approximation, about the same as the raw data volume.

9. Can external users remotely access experimental logs after the experiment?

InstituteAnswer
CERIC-ERIC

Today:  An electronic logbook is available for users during the experiments. The user can place this e-logbook in the experiment results directory and then access it though the user portal (VUO). Forecast 2023: We aim at have interactive log books using Jupyter Notebook technologies to allow users to reproduce the results described in the log book directly from it.

ELI

ELI-ALPS: yes (custom logbook application with external access). ELI-BL: Not yet.

ESS

Yes. Through an interactive chat interface (SciChat).

ESRF

This will be possible with the implementation of the data policy on the beamlines and the electronic log-book.

ILL

Yes. See https://logs.ill.fr.

XFEL

Yes. Metadata catalogue and elogs can be accessed as long as account is active.

ISIS

The experimental logs are included in Nexus files.

Technology

1. What operating systems for data analysis does your facility support (please specify any Linux distributions)?

InstituteAnswer
CERIC-ERIC

Ubuntu (16.04, 18.04), CentOS 7, Windows.

ELI

CentOS 7, 8(2020).

ESS

CentOS7.3 is provided on the HPC Cluster. CentOS7.6 and Ubuntu18.04 are used for virtual machines.

ESRF

Debian 8, 9, 10.

ILLUbuntu 16.04 and 18.04. We also support OS X and Windows for some software.
XFEL

CentOS 7; Ubuntu 18.

ISIS

ISIS’s support includes: Windows, Redhat, CentOS 7 and Scientific Linux.

2. What is the main Linux distribution used for the server infrastructure at your facility?

InstituteAnswer
CERIC-ERIC

CentOS 7.

ELI

ELI-ALPS: Ubuntu

ELI BL: CentOS 7 and some NI Linux RT (for control applications).
ESS

CentOS7.x.

ESRFDebian.
ILLDebian.
XFEL

CentOS 7.

ISIS

Redhat and varieties.

3. What compute infrastructure is dedicated to scientific computation at your facility?

InstituteAnswer
CERIC-ERIC

Tesla units (GPU), HPC cluster, local cloud

ELIECLIPSE HPC Cluster, Upcoming (2020) HPC Cluster; DAQ servers with 14 blades/24 cores each + 768GB RAM for memory buffer.
ESS

For scientific computation we have a HPC cluster (see below), as well as a virtualisation environment used for infrastructure VMs as well as prototyping the data analysis environment - this consists of three virtualisation hosts (32 cores, 384 GB RAM each).

ESRF

Compute clusters & dedicated machines for online data analysis.

ILLCompute cluster & dedicated machines for online data analysis.
XFEL

General purpose HPC cluster (Maxwell Cluster).

ISIS

SCARF and more recently the SCD Cloud as well as dedicated individual machines

4. Do you have an HPC cluster? If yes, what is the size of the cluster? I.e. number of cores, memory etc.

InstituteAnswer
CERIC-ERIC

Yes. It has 16 work nodes (8 nodes of 2.50GHz, 8 nodes of 3.40GHz) interconnected 10 Gb Ethernet, 132 GB RAM, CentOS 7, Slurm

ELI

ELI-ALPS:
- CPU computing cluster with 5 nodes, 36 cores and 768 GB RAM each.
- GPU computing cluster with 5 nodes, 20 cores, 128 GB RAM and 2 NVIDIA Tesla K80 card each.

ELI-BL:

ECLIPSE HPC Cluster (1344 cores and 10.75 TB RAM in 84 nodes), Upcoming (2020) HPC Cluster (~8000 cores, ~50 TB RAM in ~324 nodes).
ESS

Modest HPC cluster – 1400 cores, 5GB/core on average, QDR IB (40 Gbps), no GPUs.

ESRF

Yes, 2018: we have currently 3200 Intel cores and 28 GPUs. 2023: a substantial upgrade is required, ideally to 20k Intel (or AMD) cores and 150 GPUs.

ILLYes. HPC cluster has ~800 cores with each node having betwene 32gb to 256gb of RAM . Access to 30TB of storage. The HPC cluster is moslty used for data simulation. For data analysis and reduction, dedicated servers are available.
XFEL

EuXFEL owns a partition of the Maxwell cluster. Memory 256GB-1.5TB per node, 40-80 cores per node, HDR/EDR IB Backbone. Currently around 300 nodes, ~18,000 cores, ~150TB RAM.

ISIS

SCARF. See https://www.scarf.rl.ac.uk/hardware.

5. How is your scientific software distributed? (CERN-VMFS, custom Linux repositories, release downloads, artefact repositories...)

InstituteAnswer
CERIC-ERIC

Internal GitLab, GitHub, Custom Linux repositories

ELI

No distribution.

ESS

Conda, github, dedicated websites, custom linux repositories.

ESRF

As Debian packages and Python packages in pypi and eventually in conda (when other persons are packaging them).

ILLOn git repositories, web site and NFS shares.
XFEL

Custom repositories, shared file system installations (GPFS, AFS), CVMFS available but currently not used for photon science.

ISIS

Some as cross platform binary distributions such as Mantid https://www.mantidproject.org/Main_Page). The forward looking plan is to distribute through cloud services more.

6. What emerging tools are you currently looking into or are interested in using?

InstituteAnswer
CERIC-ERIC

JupyterHub, containers technologies (Docker, Singularity), Data Analysis as a Service, remote desktop technologies, K8S.

ELI

We are looking a bit into containerization instead of classical virtualizations - but this is more to help with CS development and deployment on diverse hardware rather than provided as a service.

ESS

DASK (Distributed NumPy Arrays), Py-bind-11, Singularity.

ESRFSingularity.
ILLKubernetes, Singularity and Ansible.
XFEL

Hard to answer due to how closely linked our infrastructure is to DESY. At XFEL we are interested in BinderHub, Singularity, FPGA, ML/AI for data reduction, and Common Workflow Language. From the DESY side: Cpack on CVMFT. nextflow, airflow, airavata, scicat. k8s on bare (HPC) metal. Podman. Qiskit and related. Keycloak, slurm cloud bindings, ...

ISIS

Use of the cloud services for users.

7. Do you currently provide any remote desktop services to users internally at the institute and to users after their experiments?

InstituteAnswer
CERIC-ERIC

VNC for users during the experiment.

ELI

No. Currently nothing anticipated; however not excluded if needs emerge.

ESS

Currently developers can use VNC to access build machines or test software on VMs.

ESRF

NoMachine (nx.esrf.fr).

ILL

Yes. Using https://visa.ill.eu.

XFEL

Full graphical remote access for all users via FastX.

ISIS

ISIS Data Analysis as a Service (isis.analysis.stfc.ac.uk).

8. Are you currently running any Data Analysis as a Service pilot projects (such as VISA/CalipsoPlus at the ILL)?

InstituteAnswer
CERIC-ERIC

Yes, CalipsoPlus.

ELINo.
ESS

No, we are investigating this currently – considering something integrated with SciCat and Jupyter notebooks.

ESRF

jupyterhub + kubernetes, binderhub and jupyterhub+slurm.

ILLCurrently using VISA.
XFEL

None at EuXFEL, CalypsoPlus hosted at DESY.

ISIS

ISIS Data Analysis as a Service

9. Do you have a cloud infrastructure in place (OpenStack, Nebula, Orchestration etc.)?

InstituteAnswer
CERIC-ERIC

Yes

ELI

Not at the moment, some interest - but no concrete plans.

ESS

We use a combination of Ovirt and foreman to deploy VMs for users/infrastructure.

ESRF

We are currently strongly limited by our human resources. The initial idea was to offer OpenStack, but this has been put on hold for the time being.

ILLOpenStack, VMWare and Proxmox.
XFEL

EuXFEL and DESY share some infrastructure, OpenStack is available via DESY and can technically be used at EuXFEL.

ISISOpenStack.

10. What are you using for machine virtualisation (VMWare, KVM…) ?

InstituteAnswer
CERIC-ERIC

Proxmox (KVM).

ELI

VMWare; No virtualisation on HPC.

ESS

Ovirt.

ESRFN/A.
ILLKVM and VMWare
XFEL

Varies, both used at EuXFEL (online cluster/remote desktop); additional software at DESY.

ISIS-

11. What are you using for containerisation (Docker, Singularity…)?

InstituteAnswer
CERIC-ERIC

We are mainly using Docker but also investigating Singularity approach.

ELI

ELI-ALPS: Docker

ELI-BL: Singularity, Docker for early experiments.
ESS

Currently Docker containers orchestrated by Kubernetes.

ESRFDocker.
ILLDocker.
XFEL

Docker, Singularity. Considering Podman.

ISIS

Docker and Singularity.

12. Which protocol(s) are you using to access experimental data (NFS, SMBFS, SFTP....)?

InstituteAnswer
CERIC-ERIC

CEPH, NFS, SFTP, SMBFS.

ELI

ELI-ALPS: SMBFS

ELI-BL: NFS/SMBFS and a custom ZMQ based streaming solution; however we’ve just tendered a software-defined storage (commissioning in 2020) and plan to support a number of protocols depending on scientific needs and capacity to integrate.
ESSNFS.
ESRF

NFS, SMBFS, GPFS.

ILLNFS, SMB, GPFS, HTTP and SFTP.
XFEL

GPFS, BeeGFS locally. NFS exports, FTPS for remote copying, Globus online, ...

ISIS-

13. Do you provide functions-as-a-service?

InstituteAnswer
CERIC-ERIC

No.

ELI

No.

ESS

No, we have previously investigated them as part of the EOSC pilot.

ESRFNo.
ILLNo but we are currently looking into using something like OpenWhisk or Kubeless.
XFEL

Partially. EuXFEL does not, however some FaaS from DESY can be used.

ISIS-

14. What is the incoming and outgoing bandwidth at your facility?

InstituteAnswer
CERIC-ERIC

Incoming and outgoing are 10Gbps, moving to 100Gbps next year (2020).

ELI

ELI-ALPS: 10G redundant currently, with an option to extend 100G.

ELI-BL: 10G dedicated to user data; anticipated extension to 100G.
ESS

Currently 10G (DMSC/Copenhagen – Danish Research Network).

ESRF

2018: A shared 10Gbps Internet connection (to the metropolitan network TIGRE, then RENATER, then GEANT). 2023: 100Gbps.

ILL

2018: A shared 10Gbps Internet connection (to the metropolitan network, then RENATER, then GEANT). 2023: 100Gbps.

XFEL

Incoming: 2x50Gb. Outgoing: 2x50Gb.

ISIS-

15. Do you use cloud providers for hosting external services or is everything hosted internally?

InstituteAnswer
CERIC-ERIC

Everything is hosted internally.

ELI

ELI-ALPS: Currently everything is on-prem, there are plans to move non-core services to the cloud.
ELI-BL: Hosting is currently 99% internal, but there are no general reservations against cloud providers.

ESS

Everything is provided internally.

ESRF

2018: everything is hosted on-site. 2023: scale-out with commercial cloud providers.

ILL2018: Everything related to scientific data is hosted internally.
XFEL

Internally.

ISIS

Currently internally but plans to expand to use external services also

16. If you were to give users access to remote virtual machines, would you give them root access?

InstituteAnswer
CERIC-ERIC

No.

ELI

ELI-ALPS: Only under special circumstances

ELI-BL: Possible, not required up to now.
ESS

For developers we currently give them root access on dev machines, for scientific users, this is currently undetermined (but likely not (if possible)).

ESRFNo.
ILLNo.
XFEL

Depends on the user, likely not. Restrictions apply (e.g. no direct access to experimental data).

ISIS

In general no, only for specific testing purposes such as for example software testing of specific critical facility software.

17. How would you govern the computational resources required (i.e. CPUS, memory) for a given experiment?

InstituteAnswer
CERIC-ERIC

The computational resources are managed by the IT Group according to each experiment need. In future, with a JupyterHub installation for instance, we will be able to predefine available resources so users can choose the amount accordingly.

ELI

No established governance yet.

ESS

Currently undetermined – likely through VM environment allocation rules.

ESRF

2018: free access first come first served. 2023: data intensive experiments. Data Management Plan and accordingly resource allocation

ILL2019 (pilot phase): free access first come first served.
XFEL

Not governed. Beamline staff makes a rough prediction, which can be adjusted on demand.

ISIS

Depends on the complexity of the instrument and the experiments

18. Can users choose the amount of resources required for data analysis at the facility?

InstituteAnswer
CERIC-ERIC

Not nowadays. See previous question for explanation.

ELI

No constraints via governance, limitations by the hardware/capacity taken into account as part of the feasibility assessment of the user proposal.

ESS

Currently undetermined, but, likely yes. Where the user will be provided with sufficient resources during the active experiment, post-experiment analysis will likely be provided with a selection of configurations (hopefully the amount of allocated resources can be coupled to the proposal, but, currently undetermined).

ESRF

See previous question.

ILLYes, using VISA.
XFEL

Not really. Depends also on the type of compute pipelines. In principle all users have access to the entire HPC cluster.

ISIS

Yes, but only a limited number of choices are given

19. Do you have any particular quotas in place related to computation processing (i.e. number of hours, cores, …)?

InstituteAnswer
CERIC-ERIC

No.

ELI

No established policy yet.

ESS

Currently, we have slurm fair-share allocation implemented on our HPC.

ESRF

2018: no, free access, but monitoring and complaints management. 2023: resource allocation methods have to be found which allow us to regulate the usage and to make sure everybody can work satisfactorily.

ILLMaximum of three running jobs.
XFEL

Varying quota on number of concurrent jobs, number of nodes per job. Jobs have mostly exclusive access to nodes. A small number of nodes are reserved for each experiment during the beam time, plus dedicated compute hardware close to the experimental hutch (“online cluster”).

ISIS

No

20. Do you have a job submission system for scientific computing resources (Sungrid, Torque, Slurm, Oar...)?

InstituteAnswer
CERIC-ERIC

We do have Slurm as the job submission system for the HPC cluster.

ELI

Slurm.

ESSSlurm.
ESRF

Currently changing from OAR to Slurm.

ILLTorque. Currently looking into using Slurm.
XFELSlurm.
ISIS

SCARF uses Slurm.

Security

1. Which authentication provider(s) is used (i.e. umbrella, keycloak etc.) for user access (internal/external) ?

InstituteAnswer
CERIC-ERIC

Local (VUO, LDAP). We do have Umbrella authentication as well.

ELI

No decision about the solution for external users has yet been taken.

ESS

Currently, no federation AAI is implemented – all AAA is internal (AD/LDAP).

ESRF

Umbrella and Keycloak.

ILL

Umbrella and Keycloak.

XFELLDAP.
ISIS

Umbrella is supported

2. Which authentication protocols do you use to authenticate your users? I.e. OpenConnect, CAS, SAML etc.

InstituteAnswer
CERIC-ERIC

Local based on LDAP.

ELI

No decision about the solution for external users has yet been taken.

ESS

Currently, no federation AAI is implemented.

ESRFSAML.
ILLSAML. CAS, OpenIdConnect.
XFEL

LDAP/Kerberos.

ISIS-

3. Can people who have never visited your facility access data or analysis services (ie to access open data)? If so how do they authenticate?

InstituteAnswer
CERIC-ERIC

Yes, but they have to be registered in the user portal (VUO).

ELI

No decision about the solution for external users has yet been taken.

ESS

Currently, no federation AAI is implemented.

ESRF

No, so far available services are not visible from outside, and when made available users will have to authenticate. However, getting an account will be simple.

ILLUsing the keycloak SSO.
XFEL

If users are part of an experiment they can request an account, once approved this account has access to the data from any experiments that they have participated in. In the future once the embargos start lifting on past experimental data some system to request a ‘public’ account will be needed.

ISIS

Open data is accessible for everybody. For non-open data access is only given to users who register with the user office and only for experiment they have been participating in.

4. Do you provide a means for people to apply for access to open data and analysis services?

InstituteAnswer
CERIC-ERIC

It is all connected to the proposal.

ELI

No decision about the solution for external users has yet been taken.

ESS

Currently undetermined.

ESRF

https://data.esrf.fr allows access to open data, but there is no plan to give access to analysis services publicly.

ILLYes. Users can create an account and access data using https://data.ill.eu , however, there is currently no way to apply for access to compute services. This is currently being looked into.
XFELNo.
ISIS-

5. What IT services are provided to external users (ssh, sftp, specific web applications)?

InstituteAnswer
CERIC-ERIC

ssh, sftp, data browsing and downloading through user portal (VUO). Data visualization (HDF5 files) through web application. Remote desktop via Guacamole.

ELI

ELI-ALPS: SFTP and an eLearning web application

ELI BL: data transfer for specific commissioning experiment.
ESS

Currently undetermined.

ESRF-
ILL2019: SSH, SFTP, JupyterHub, VISA, data catalogue.
XFEL

SSH, SFTP, JupyterHub, Globus, remote desktop, metadata catalogue.

ISIS

For example the ISIS data catalogue and ISIS Data Analysis as a Service

6. Do you have any security restrictions at your facility (i.e. is a VPN connection required before accessing services)?

InstituteAnswer
CERIC-ERIC

For some services such as ssh a VPN connection is required. For data browsing, downloading and visualisation a valid VUO account is required.

ELI

ELI-ALPS: internal services (if any) can be accessed only via VPN.

ELI-BL: VPN connection is required for most internal services; very limited and controlled outside interfaces (data transfer).
ESS

Not to access external resources, otherwise, ssh (or VPN for internal users).

ESRF

A moving target. IT security has to be enforced. VPN access will most likely disappear.

ILLNo. A VPN connection is not requred to access data services.
XFELNo.
ISIS

Authentication with the user office account.

7. How are users associated to their experimental data? (i.e. LDAP groups, POSIX attributes…)

InstituteAnswer
CERIC-ERIC

LDAP groups.

ELI

There is no structured solution yet. This is anticipated at ELI Beamlines in 2020. Constrained via LDAP groups/network separation/localized storage to specific experiments.

ESS

LDAP/POSIX.

ESRFACLs.
ILLLDAP groups and extended NFS ACL attributes.
XFELNo.
ISIS-

8. Do scientific visitors have a user home at your facility (linux home, windows home…)?

InstituteAnswer
CERIC-ERIC

Yes. Each user has a unix user id which is used to access their user home and other resources available in the IT infrastructure.

ELI

Not yet.

ESS

Yes, for cluster users (neutron facility not yet in operation, though).

ESRFNo.
ILLYes.
XFELYes.
ISIS

Those who use the Data Analysis as a Service have

Other information

InstituteAnswer
CERIC-ERIC

Ideally all the services should be easily accessible via web through the internal portal based on the VUO and integrated with PaNOSC common portal as well as EOSC in future.

ELI-
ESS

Please note that the replies herein relates to ESS, and that the facility is not currently in operation, and does not yet have a user programme (test user programme starts in 2023). The replies therefore are not representative for active neutron facilities and should, in general, be used with care in the context of active facilities (e.g. care should be taken so the expected data rates are not included in averages together with other facilities etc.). In addition, values, policies and procedures are still on the planning stage, and are very much subject to change.

ESRF-
ILL-
XFEL

As mentioned a few times, a lot of EuXFEL infrastructure is closely linked to DESY, so answering questions on what nodes are available to us and what software services are ran at our facility can be difficult due to the blurry line between what might count as ‘ours’ and what counts as DESY’s.

Our services are underpinned by a number of publicly available libraries and tools, including:

  • pyFAI

  • HDF5/H5py

  • H5glance

  • Silx

  • Jupyter Notebook/Jupyter ecosystem tools

  • ZeroMQ

  • MessagePack

  • Many SciPy ecosystem libraries – numpy, matplotlib, xarray, pandas, etc…

  • PyQt5/PyQtGraph

ISIS-

Next steps

Add next steps here

Conclusion

Add conclusion here

  • No labels