New users
Welcome to Marbits. This page is intended to give you a quick overview of things you need to do or to know in order to have a smooth computing experience. Please, take a look to the rest of the documentation. If you have any question or petition, please sign in to the Marbits slack group.
If you were a biocluster user, you may also be interested in visiting the links at the bottom of this page.
Things you need to use Marbits
You need a few things to work at marbits:
– A marbits
user account, associated to at least one project or «bank» account. You or your supervisor have to send an email to pablosanchez at icm.csic.es to request the user account and also specify the project(s) to which it will be associated, for accounting and billing purposes.
– A cabled access to the local network at CMIMA. If you are going to be around for a while, make sure you integrate your computer within the CMIMA local network.
– Access to the ICM’s VPN or the salamandra
server if you work remotely from outside CMIMA. If you are a CSIC employee, the VPN is the way to go. Ask to the IT service for either one.
– A computer with a UNIX-compatible terminal app. If you are using Linux or Mac you are all set up, but if (unfortunately for you) you are running Windows you’ll need to install a telnet/ssh
emulator like console or PuTTY (do some research, I haven’t tried any of these but I know people who have used PuTTY with no issues). Alternatively, you could create a virtual machine in virtualbox and install any linux distribution you like, such as Ubuntu (or ditch Windows altogether and install linux as your main operating system once for all).
First steps
Marbits is a High Performance Computing Cluster (HPC). It consists basically in a series of computers (nodes) connected via a network under the command of a master node. It provides computing capabilities that a regular laptop or PC can’t: it has many computing cores and quite a large amount of RAM. Its particular architecture allows several users to run jobs at the same time, sometimes using several cores to speed up the calculations. Please take a look to the System description page for more details.
All this means that you need to interact with marbits differently than with your laptop. You need to access via a network connection and you are not going to have fancy windows or access to a mouse, but just a plain Command Line Interface (CLI). Hence, a basic knowledge of terminal commands is required to use marbits. If you need help with that we put up a short and to-the-point introduction to UNIX and the Shell for beginners. There are plenty of resources and books all over the web to learn them. Check out the Software Carpentry lesson on unix shell. From time to time we gather to do a series of seminars, LaPera Sessions
, to give beginners some tools to increase their bioinformatics skills.
Logging in
From a terminal window in your computer type
ssh username@marbits.cmima.csic.es
The first time you log in it is advisable that you change your default password. Type
passwd
and follow instructions. Remember that you can always get help on how a command works with
man command_name
Basic set of rules (there are always rules)
- Don’t run jobs on the head node (marbits, yes that computer you just logged in with ssh). See how to run jobs.
- Your
/home
directory (the place where you land when you log in to marbits) is currently very small. Use it only to store your configuration files and small scripts or programs. Don’t run jobs that write to your/home
directory, not only because they may not have enough room there, but also because it degrades the performance of the cluster. Your/home
is protected by a quota and can’t take more than 5 Gb of storage. Check how your quota is holding up typingquota
. - All the stuff you want to keep should go to a directory with your username that is linked in your
/home
and is physically located in the Lustre filesystem at/mnt/lustre/bio/users/yourUserName
. It is very important that you follow this procedure, because all the filesystem is backed-up every night, and temporary files or large files or files that change a lot outside thescratch
volume severely affects the back-up process. Feel free to build as many links to your data in the lustre filesystem at your marbits/home
to make your life easier. - Make your jobs write their output to
/mnt/lustre/scratch
. This is a scratch directory, that is mounted in the Lustre filesystem. This directory doesn’t have back-up and there’s an automated task that deletes all files that are older than 3 months, so keep reading… - If you have big original datasets, ask Pablo for assistance. All datasets that are public or can be downloaded again may go to
/mnt/lustre/repos
and are read-only. You are not supposed to write there. - Storage quotas for users and projects are not still defined, but you don’t have infinite storage space in the Lustre. Please, use common sense and delete anything that doesn’t have a purpose anymore, like temporary files. Also compress all your files when possible. Administrators will monitor space usage.
- And finally, DO NOT run any analysis on the head node. Seriously. Look at how to run jobs. The master node is heavily monitored and jobs running there will be mercilessly killed.
Existing biocluster users
If you were using biocluster, you may want to read what has changed from biocluster to marbits and where do I start now.