How to Monitor Jobs

From Statistics Cluster
Revision as of 15:43, 3 November 2016 by Barnes (talk | contribs)
Jump to navigation Jump to search

Check the state of the job

Check the jobs of a specific user

Once jobs have been submitted to the cluster, monitoring can be performed using the following command in a terminal

[username@computer ~]$ condor_q -submitter <username>| less
CondorMonitoring.png

This will display

  • ID: the process ID
  • OWNER: the owner of the job
  • SUBMITTED: the date and time it was submitted
  • RUN_TIME: how long it has been running
  • ST: its current status (run R, held H , idle I)
  • SIZE: the job size
  • CMD: program name

This is useful to monitor your own jobs to check on their status.

Check all jobs

If you want to see all of the jobs in the queue

condor_q | less

Check which machine the job is running on

Another useful command is condor_status which can tell you information about the cluster machines

[username@computer ~]$ condor_status [-r] | grep stat | less

This will show a list of the various machine resources and if the option -r is supplied it will only show machines with running jobs.

If there are any concerns about a specific job, please contact the main administrator.

Check the job as it runs

Check stdout

If a job is running you can execute the following command to see the tail of the stdout file

condor_tail <job_id>

Check stderr

If you want to see if there are any errors you can run the following

condor_tail -no-stdout -stderr <job_id>