Bug 5771 - Reserving generic resources
Summary: Reserving generic resources
Status: RESOLVED FIXED
Alias: None
Product: Slurm
Classification: Unclassified
Component: Scheduling (show other bugs)
Version: 23.11.x
Hardware: Linux Linux
: --- 5 - Enhancement
Assignee: Danny Auble
QA Contact:
URL:
: 10934 17226 (view as bug list)
Depends on:
Blocks:
 
Reported: 2018-09-24 15:01 MDT by UCF ARCC
Modified: 2023-10-30 10:47 MDT (History)
8 users (show)

See Also:
Site: UCF
Alineos Sites: ---
Atos/Eviden Sites: ---
Confidential Site: ---
Coreweave sites: ---
Cray Sites: ---
DS9 clusters: ---
HPCnow Sites: ---
HPE Sites: ---
IBM Sites: ---
NOAA SIte: ---
OCF Sites: ---
Recursion Pharma Sites: ---
SFW Sites: ---
SNIC sites: ---
Linux Distro: ---
Machine Name:
CLE Version:
Version Fixed: 23.11.0rc1
Target Release: 23.11
DevPrio: 1 - Paid
Emory-Cloud Sites: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description UCF ARCC 2018-09-24 15:01:45 MDT
We know how to use scontrol to reserve nodes or cores, but is there a way to specifically reserve a gres ... in particular, a GPU?  What would the command be?

Thanks,
Paul.
Comment 2 Michael Hinton 2018-09-24 16:25:27 MDT
Hey Paul,

Currently, Slurm can’t explicitly create reservations for GRESs. Supported resources that can be reserved include cores, nodes, licenses, burst buffers, and features.

However, you could possibly work around this limitation by creating reservations based off a feature. To do that, set a feature for nodes that have GPUs with a string like “k80” and issue a command like the following:

    scontrol create reservation starttime=now nodes=all duration=15 features=k80 user=root 

Just be aware that “the reservation creation request can [only] identify... *one* feature that every selected node must contain.” See https://slurm.schedmd.com/reservations.html.

Luckily, more advanced GPU reservation and scheduling is something we are actively developing for 19.05, so stay tuned!

And of course, for merely scheduling GPUs, take a look at https://slurm.schedmd.com/gres.html and https://slurm.schedmd.com/gres.conf.html.
Comment 3 Michael Hinton 2018-10-10 14:46:41 MDT
Feel free to reopen this bug if you have any more follow up.

Thanks,
Michael
Comment 4 Markus Kötter 2022-10-20 06:01:50 MDT
I do not see this working yet with 22.05.

scontrol create reservation partition=debug starttime=now duration=120 duration=120 user=root flags=maint nodes=ALL tres=gres/gpu:gtx=3

scontrol: error: TRES type 'gres/gpu:gtx' not supported with reservations
Comment 5 Jason Booth 2022-10-20 09:04:27 MDT
Hi Markus. We have an active enhancement tracking this request bug#10934. 

> I do not see this working yet with 22.05.

 As mentioned previously this is something Slurm does not currently support.

Quoting Tim's reply regarding future plans for reservations and gres:

> This remains an unsponsored development request, and as such, there is no 
> specific timeframe we expect to implement this on. If a SchedMD customer is 
> interested in sponsoring development then it'll be much more likely to move 
> forward, otherwise, like a lot of other tickets filed under '5 - Enhancement' - 
> this will remain in limbo.
Comment 6 Tim Wickberg 2022-12-14 14:03:39 MST
*** Bug 10934 has been marked as a duplicate of this bug. ***
Comment 9 Danny Auble 2023-01-30 15:04:45 MST
Markus,

Could you give me a few examples of what you would expect in your usage of this?

An expectation on our end is a reservation will always include cores/nodes along with the other TRES requested.  Does this meet with your expectations/workflow as well?
Comment 10 Danny Auble 2023-06-19 14:02:34 MDT
Markus, I am starting to work on this and am looking for the further guidance mentioned in comment 9.

Please reply at your earliest convenience.
Comment 11 Markus Kötter 2023-06-20 01:38:06 MDT
(In reply to Danny Auble from comment #9)
> An expectation on our end is a reservation will always include cores/nodes
> along with the other TRES requested.  Does this meet with your
> expectations/workflow as well?

Depending on your definition of "cores".
I would not require cpu-cores as part of the reservation as it could default to the partition DefCpuPerGPU when a gpu TRES is provided.

For nodes, I agree with your expectation, as in "one node with 4 gpus" or "4 nodes with one gpu each"
Comment 12 Jason Booth 2023-07-18 10:27:00 MDT
*** Bug 17226 has been marked as a duplicate of this bug. ***
Comment 17 Danny Auble 2023-10-17 01:28:12 MDT
Markus,

I think I have what is required in the master branch after commit 0c0ddd55f1.

Could you please test and verify things are working as you would expect.

I added 2 options.

TRES=gres/gpu:1

or

TRESPerNode=gres/gpu:1

Both are case insensitive.

Let me know if you have any problems or not.

Thanks!
Comment 18 Danny Auble 2023-10-30 10:46:49 MDT
Please reopen (or open a new bug) if this is not working as expected.  The current master branch should have all the functionality required.
Comment 19 Bas van der Vlies 2023-10-30 10:47:03 MDT
I am on holidays from 27-Oct till 7-Nov-2023