= Data Management = [[TracNav]] [[TOC(inline)]] Grid services participating to data management are the ''File Catalog'' (LFC) and the ''Storage Element'' (SE). * File Catalog: this service manages file names used by the user and associates them with replicas. Every grid file with a catalog entry can have several replicas. Catalog file names are Unix-like. * Storage Element: this is the service responsible for actually managing the data. It handles access to one file replica. Several SE implementations are available, interfacing with different back-ends like disks, distributed file systems, MSS (tapes). A SE is accessed through the ''SRM'' interface that has several advanced features like space reservation, file pinning, and staging from MSS. == Main Data Management Commands == A set of commands, known as ''lcg_utils'', are the main user-oriented tools to handle file and data management. They are completly independent of the actual SE implementation used and they take care of consistency between file catalog and storage elements. For example, they create logical file names in catalog when creating a file, they update a catalog entry when adding a new replica, they remove an entry from the catalog if the interaction with the SE is not successful, etc. ''Note: each SE implementation provides its private set of client commands. They are not described here, even though some of them are independent of the SE implementation and may work with others. They are not needed, except in very specific circumstances.'' Most of the commands, when referring to grid files stored on a SE, can use one of the following identifiers for the file: * Logical File Name (LFN): entry name in the file catalog, with a Unix-like syntax and starting with `/grid/`''voname''. A LFN must be prefixed `lfn:`. * GUID: immutable identifier associated with a catalog entry (doesn't change when the LFN is changed/renamed). A GUID must be prefixed `guid:`. * SURL: name of a replica on a particular SE. It has a URL format like `srm://se.host.name/path/to/file`. When referring to local file, the prefix `file:` is optional and relative file names are supported. Main commands are: * `lcg-cr` (copy and register): this command copies a file available locally (it can also copy a file from another SE) and create a new file on a SE, registering it in a file catalog. Copy destination '''must be''' a storage element. Even if the source is a file on a SE, this creates a new file and not a new replica of the existing file. By default logical file name registered in the LFC is generated but it can be set explicitly using option `-l`. The storage element to use for the destination can be specified with option `-d`. If omitted, the default SE (as specified by environment variable `VO_VONAME_DEFAULT_SE`) for the VO is used. If `-d` is present but specifies only a host name, the file name on the SE is generated. It is recommended to use option `-v` (verbose) to get details about the copy operation and ease the troubleshooting in case of errors. * `lcg-cp` (copy): feature and syntax very similar to Unix `cp` command. Source and destination can be either a local file or a file on a SE, with any combination. Conversely to `lcg-cr` destination is not registered into a catalog and cannot be a LFN (the input file may be identified with a LFN or GUID if it resides on a SE). This command is mainly used to get a local copy of a SE file on a UI or WN. * `lcg-rep`: this command allows adding a new replica to an existing file registered in a file catalog. Syntax is similar to `lcg-cr` but the source '''must be''' a LFN. But conversely to `lcg-cr`, this command updates an existing entry in the file catalog but doesn't create a new one. * `lcg-lr` (list replica): list all the replicas associated with a given LFN or GUID. * `lcg-ls` : feature a syntax very similar to Unix `ls` command. * `lcg-del` : removes a file replica from a SE and optionally removes an entry from the LFC after removing all replicas. * `lcg-lg` : returns the GUID associated with a LFN. All these commands rely on BDII to guess defaults according to the VO used and to find the appropriate parameters to talk with the selected SE. In case the BDII service is not functioning properly this may cause the commands to fail. Option `--nobdii` to `lcg-xxx` commands may help but requires adding a lot of parameters to the command like SE port to use, etc. All the commands have an online help available with option `--help` or through the `man` command. __Exercises__ : 1. Display information about available SE resources for VO `vo.lal.in2p3.fr`, using command [wiki:Tutorial/SystemInfo#LaCommandelcg-infosites lcg-infosites]. * How many SEs are available for VO vo.lal.in2p3.fr ? * Find the same information for VO `dteam`. How many SEs are available for this VO ? 1. Create a text file and copy it on a SE using command `lcg-cr`, using option `-l` to define the logical name (see above for the logical name format). 1. Check that the file is present in the catalog and list its replicas, using command `lcg-lr` with both the logical file name and the GUID. This should return SURL for all the replicas (a SURL starts with prefix `srm:`). 1. Display detailed information about the file, using command `lcg-ls`. 1. Find the GUID associated with the previous files. 1. Replicate the file on another SE open to the VO (using the information obtained from `lcg-infosites`). 1. Check that the new replica has been added to the catalog entry for the file, using command `lcg-lr` as previously. Compare the information returned. 1. Copy the file from the SE to your local disk, using command `lcg-cp`. Do it using the GUID, the LFN and a replica SURL. 1. Erase one of the replicas using command `lcg-del` and check the result with `lcg-lr`. 1. Erase the other replica using command `lcg-del` and check the result with `lcg-lr`. 1. Recreate a file with 2 replicas and try to erase all the replicas at once. Try to do a `lcg-lr` and a `lcg-ls`. == Using File Catalog == The ''File Catalog'' service (also known as ''replica catalog'') in gLite is implemented using LFC. LFC service has a client implemented through commands whose names start with `lfc-`. These commands are installed on every UI and WN. These LFC commands are seldom used: LFC interaction is normally done with commands handling the necessary interactions with both LFC and SE services to implement actual file management (mainly the so-called ''lcg_utils'' commands whose names start with `lcg-`.). There is one exception, `lfc-mkdir`, that can be used to create empty directories. The lcg-xxx commands will create missing directories, but cannot create an empty directory. The command `lfc-mkdir` supports a `-p` option to create parent directories, similar to the same option in Unix `mkdir`. All of these commands require a valid VOMS proxy with a valid VOMS extension. To use the `lfc-xxx` commands, either define the environment variable `LFC_HOST` to the host name of the LFC server to use or prefix the name with `lfc.host.domain:`. The second method is the preferred one. === LFC client main commands === The main LFC commands that may be used under normal circumstances are: * `lfc-mkdir [-]`: create a directory in LFC and optionally its parents. * `lfc-ls [-l]`: similar to Unix `ls` command. Gives the content of a directory and optionally detailed information in a Unix like format. User and group are the DN of the owner and the FQAN of the primary VOMS group. * `lfc-getacl / lfc-setacl`: LFC namespace supports Posix-like ACLs. See the man page for more information on the syntax, in particular for `lfc-setacl`. A file name has one ACL but a directory has 2 different ACLs: the ACL controlling access to the directory and the default ACL applied to new files (new sub-directories inherit the parent ACL). ''Note: The LFC only deals with logical file names. Because of this, the arguments to the `lfc-xxx` commands should '''NOT''' be prefixed with `lfc:`''. __Exercises__ : 1. Determine the central LFC catalog for the VO vo.lal.in2p3.fr. 1. Create an empty directory within the LFC. 1. Verify that the empty directory exists with both `lfc-ls` and `lcg-ls`. 1. Remove the directory (`lfc-rm -r`) you created and verify that it is gone. == Access Data on the Grid from a Job == In the job description, it is possible to describe the data available on the grid that the job needs to access. Data files can be identified by a LFN, a GUID or a SURL. This description is taken into account by WMS to select the appropriate site: a CE will be selected only if one the file replica is on one of the ''close SE'' for the CE (this is a configuration information for each CE). Then the job can retrieve the list of files provided, the replicas used, etc. with command `glite-brokerinfo` (available only on WNs). To specify the grid files the job needs access to, it is necessary to add the following two lines to the JDL: {{{ DataAccessProtocol ={ "gsiftp","rfio"}; InputData = {"file1", "file2", ...}; }}} `DataAccessProtocol` is a list of transfer/access protocol that the job will use to access the data. The protocols in the example are the most common but other possibilities are `xroot`, `dcap`, and `https`. There is no default and this attribute is required if `InputData` is present. File names specified in `InputData` can be LFN, GUID or SURL. The standard syntax is used with the corresponding prefix `lfn:`, `guid:` or `srm:`. See [wiki:Tutorial/DataMgt#MainDataManagementCommands above]. ''Note: files specified in `InputData` are not copied to the WN, conversely to files in `InputSandbox`. This is the responsability of the job to do what is relevant and appropriate.'' `glite-brokerinfo` command accept many options, the main ones being: * `getCE` * `getDataAccessProtocol` * `getInputData` * `getSEs` * `getCloseSEs` * `getSEFreeSpace * `getLFN2SFN ` * `getSEProtocols ` __Exercises__: 1. Modify one of your existing JDL file and add clauses described above with reference to a file you created with `lcg-cr` commands. 1. Use `glite-wms-job-list-match` command to display the list of sites which can execute the job. Compare with the list without the `InputData` clause. 1. Execute the job after adding `glite-brokerinfo getCloseSEs` and `glite-brokerinfo getInputData` commands to check the information passed to the job. 1. Create a second file on a different SE and require both files in your job description. What does `glite-wms-job-list-match` return in this case? == GFAL == GFAL (Grid File Access Library) is a set of low-level and high-level APIs providing: * Posix-like funtions to access files: gfal_open(), gfal_read(), gfal_write(), gfal_close()... These functions use a SURL to identify the file. * API to the file catalog to do LFN/GUID to SURL translation * API to the advanced features of SRM GFAL is available for C/C++, Python and Perl. There is no Java implementation. In fact, `lcg-xxx` commands are wrapper above GFAL. Documentation with examples is available in man pages for the functions.