Differences
This shows you the differences between two versions of the page.
| Both sides previous revision Previous revision Next revision | Previous revision | ||
| troubleshooting [2022/06/22 14:44] – old revision restored (2022/06/22 10:27) 35.156.240.123 | troubleshooting [2022/06/22 14:48] (current) – old revision restored (2022/06/17 22:18) 35.156.240.123 | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| + | ====== Troubleshooting Jobs ====== | ||
| + | ===== Job stuck in the queue and won't run: ===== | ||
| + | |||
| + | Show the queue to get JOBID and general information about what is running: | ||
| + | $ showq | ||
| + | |||
| + | Active jobs are jobs that are currently running. Eligible jobs are jobs that are waiting in line to run as soon as resources are available. Blocked jobs are jobs that cannot run for some reason; usually it is because the job has requested more resources than are allowed. | ||
| + | |||
| + | check job for errors or to see why it isn't running | ||
| + | $ sudo -i | ||
| + | # checkjob *JOBID* | ||
| + | |||
| + | If the job is blocked, the checkjob output will tell you why. Usually it's because there are not enough resources available, or because the job has asked for more resources than it is allowed to use. | ||
| + | cancel job and resubmit with new resource requirements: | ||
| + | # mjobctl -c *JOBID* | ||
| + | |||
| + | ===== Job is in a running state, but not making progress: ===== | ||
| + | |||
| + | Cancel the job. The user will need to troubleshoot the script/ | ||
| + | |||
| + | ===== No jobs running: ===== | ||
| + | |||
| + | check that the scheduler is running/ | ||
| + | # systemctl status moab | ||
| + | # systemctl restart moab | ||
| + | If Moab won't start, check the moab logs for clues: | ||
| + | # tail -300 / | ||
| + | |||
| + | Check the status of the nodes: | ||
| + | # sudo pbsnodes | ||