
|
|
|
|
|  |
FWGrid Cluster
Overview
The FWGrid cluster currently consists of 96 dual Opteron and 224 dual Xeon computing
backends. There are 94 dual 1.6GHz Opteron nodes with 2GB
RAM, 2 dual 2.2GHz Opteron nodes with 16GB RAM and 224 dual Xeon (64 x 2.8Ghz & 160 x 3.2 Ghz)
nodes with 4GB RAM. Aggregately, FWGrid provides 640 processors, 1.15 Terabytes of
Memory and 160 Terabytes of storage. Please refer to
server configuration for configuration details.
How to use it
- access - Cluster access and job execution on FWGrid
are managed by the cluster frontend,
fwgrid-compute-server-0.ucsd.edu
(fwg-cs0.ucsd.edu).
All of the computing resources are managed by the SGE job scheduler.
Users should login to the computing frontend, fwg-cs0.ucsd.edu,
and submit SGE jobs from this server.
The cluster frontend supports SSH protocol ver. 2.
Direct SSH access to the cluster computing nodes is not
permitted,
unless the user has jobs scheduled to run on the nodes.
The only exception is the two nodes that are running under
x86_64 mode. Due to the uniqueness of these two nodes,
user login (via SSH) to these servers are enabled in order for
one to perform 64-bit binaries compilation and debugging. However,
users must keep in mind that these
two nodes are still under the management of the SGE system running
on the computing frontend. Therefore, all job execution on
the the two 64-bit nodes must still
be submitted to the SGE scheduler.
- logins and passwords - Graduate students, faculty, and
staff who have access to CSEs general use computing
facilities can request access to the FWGrid compute clusters
through the Registration Page.
In general your FWGrid login and password will be the same as
on other departmental machines.
Passwords may be changed using the 'passwd' command.
- file systems - Please refer to the document
FWGrid file systems in regards to user home directories
and cluster scratch spaces.
- job execution - All jobs should be submitted to the job scheduler
via SGE scripts. This includes pre-processing jobs that are done to prepare
your code for actual execution. The frontend manages a lot of resources,
compute intensive jobs running on it will affect the entire cluster.
Please refer to the follow documents for instructions:
-
Launching Sequential Jobs Using Grid Engine.
-
Launching Parallel Jobs Using Grid Engine.
-
Monitoring SGE Jobs
-
Managing SGE Queues
Here are a few helpful flags when submitting your jobs:
* There is a separate 64 bit queue, to submit a job to it, use
-l qname="amd64.q"
* To specify a node you wish to run on, use -l hostname="fwgrid-compute-X-Y"
* To specify architecture to guarantee 32 bit or 64 bit, use -l arch="*x86" or -l
arch="*amd64"
* To request an entire node such that no other jobs may run on it, use -l exclusive=2
* To request an available amount of memory be available on a node, use -l mem_free=XG
Note - this only applies when your job first launches. So, if you request 2G free, and
your job starts on a node with 2+ G free, but your job takes time to ramp up, another
job can be deposited on that node and consume RAM as well, meaning that you may not truly
have 2G available. Best to use the "exclusive" flag as well in this case i.e. -l exclusive=2,
mem_free=2G
Please do abuse this privilege
- support - Questions about the compute cluster,
problem reports, requests for software upgrades/installations, etc.
should be directed to the
support staff.

|
 |
|
|