Tech-Veda

Sunday, October 14, 2012

ldom commands

1. ldm list-devices -a
2. ldm list-bindings primary
3. ldm set-mau 1 primary
4. ldm set-vcpu 8 primary
5. ldm set-memory 2g primary
6. ldm list-domain -l primary
7. ldm list-services

8. ldm add-vdiskserver primary-vds0 primary - adding disk service
9. ldm add-vconscon port-range=99-199 primary-vcc0 primary - adding console port range
10. ldm add-vswitch net-dev=nxge0 primary-vsw0 primary - adding virtual switch service
11. ldm list-services primary
12. ldm list-spconfig - see configurations
13. ldm list

Still updating.....

Thursday, August 16, 2012

Running Hardware diags on Solaris and Linux

Solaris -

# prtidiag -v

Default tool which comes with Solaris.
Displays system diagnostic information

------------------------------------------------------

#/opt/FJSVmadm/sbin/hrdconf -l

Fujitsu hardware native command for sparc server. Gives details config of hardware status.
------------------------------------------------------

Interactive

#/opt/FJSVmadm/sbin/madmin

The following options will show any HW failure/alerts.

- 2. Hardware Monitoring Information

- 1. Management of Hardware Error Event

- select d

You will get something like this -

Display abnormalities

Units detected abnormality:

No. Date Unit

Select unit by number.

You can confirm details and select appropriate action.

------------------------------------------------------

'/var/adm/messages' is a very good place to look for hardware related messages. Sun Solaris has a very good feature which logs each and every kernell event in logs.

If you there is any doubt, it is always suggested to go through the logs.

Example: The following events indicate scsi controller failure which was not visible in prtdiag.

Jan 11 23:24:19 wpsh214 scsi: [ID 107833 kern.warning] WARNING: /pci@1f,4000/scsi@5 (glm2):

Jan 11 23:24:19 wpsh214 Unexpected DMA state: WAIT. dstat=a0<DMA-FIFO-empty,bus-fault>

Jan 11 23:24:19 wpsh214 genunix: [ID 408822 kern.info] NOTICE: glm2: fault detected in device; service still available

Jan 11 23:24:19 wpsh214 genunix: [ID 611667 kern.info] NOTICE: glm2: Unexpected DMA state: WAIT. dstat=a0<DMA-FIFO-empty,bus-fault>

Jan 11 23:24:19 wpsh214 scsi: [ID 107833 kern.warning] WARNING: /pci@1f,4000/scsi@5 (glm2):

Jan 11 23:24:19 wpsh214 got SCSI bus reset

Jan 11 23:24:19 wpsh214 genunix: [ID 408822 kern.info] NOTICE: glm2: fault detected in device; service still available

Jan 11 23:24:19 wpsh214 genunix: [ID 611667 kern.info] NOTICE: glm2: got SCSI bus reset

Jan 11 23:24:27 wpsh214 scsi: [ID 107833 kern.warning] WARNING: /pci@1f,4000/scsi@5/sd@c,0 (sd48):

------------------------------------------------------

Serial

#/opt/FJSVmadm/sbin/serialid -a

Model

#/opt/FJSVhwr/sbin/getmodelcode

------------------------------------------------------

#/opt/FJSVhwr/sbin/fjprtdiag -v

------------------------------------------------------

'fmadm faulty'

This is a very useful command which lists faults in a OS running Sparc.

One drawback of this command is that the faults are not cleared automatically. We have to do it manually. If not we may end up troubleshooting some old messages.

http://docs.oracle.com/cd/E19166-01/E20792/z40031541454621.html

------------------------------------------------------

Apart for the above, some of Sun consoles give detailed hardware info.

Eg: E & M series have domain and platform consoles. ilom and alom consoles.

_________________________________________________________________________

Linux - Most of the Linux physical servers are hosted on HP or Dell.

HP has it's own CLI HW utility and so does Dell.

HP -

1. hpasmcli - HP Management Command Line Interface, is a scriptable command line tool to manage and monitor the HP Hardware.

2. hpacucli - HP Array Configuration Utility.

Dell -

omreport - Dell open manager.

Saturday, July 14, 2012

How to create a new file with current date stamp?

Use `date '+%Y%m%d%H%M'` to add date stamp to a file.

Example:

#touch file.`date '+%Y%m%d%H%M'`

will give the output as

file.201207140947

Saturday, January 7, 2012

Command 'fsck ' in Solaris.

Checking and Repairing Unix File system with fsck

fsck is a Unix utility for checking and repairing file system inconsistencies . File system can become inconsistent due to several reasons and the most common is abnormal shutdown due to hardware failure , power failure or switching off the system without proper shutdown. Due to these reasons the superblock in a file system is not updated and has mismatched information relating to system data blocks, free blocks and inodes .

fsck – Modes of operation :

fsck operates in two modes interactive and non interactive :
Interactive – fsck examines the file system and stops at each error it finds in the file system and gives the problem description and ask for user response whether to correct the problem or continue without making any change to the file system.

Non interactive :fsck tries to repair all the problems it finds in a file system without stopping for user response useful in case of a large number of inconsistencies in a file system but has the disadvantage of removing some useful files which are detected to be corrupt .

If file system is found to have problem at the booting time non interactive fsck is run and all errors which are considered safe to correct are corrected. But if still file system has problems the system boots in single user mode asking for user to manually run the fsck to correct the problems in file system.

Running fsck :
fsck should always be run in a single user mode which ensures proper repair of file system . If it is run in a busy system where the file system is changing constantly fsck may see the changes as inconsistencies and may corrupt the file system .

If the system can not be brought in a single user mode fsck should be run on the partitions ,other than root & usr , after unmounting them . Root & usr partitions can not be unmounted . If the system fails to come up due to root/usr files system corruption the system can be booted with CD and root/usr partitions can be repaired using fsck.

command syntax
fsck [ -F fstype] [-V] [-yY] [-o options] special
-F fstype type of file system to be repaired ( ufs , vxfs etc)
-V verify the command line syntax but do not run the command
-y or -Y Run the command in non interactive mode – repair all errors encountered without waiting for user response.
-o options Three options can be specified with -o flag
b=n where n is the number of next super block if primary super block is corrupted in a file system .
p option used to make safe repair options during the booting process.
f force the file system check regardless of its clean flag.
special – Block or character device name of the file system to be checked/repaired – for example /dev/rdsk/c0t3d0s4 .Character device should be used for consistencies check & repair
fsck phases
fsck checks the file system in a series of 5 pages and checks a specific functionality of file system in each phase.
** phase 1 – Check Blocks and Sizes
** phase 2 – Check Pathnames
** phase 3 – Check Connectivity
** phase 4 – Check Reference Counts
** phase 5 – Check Cylinder Groups
fsck error messages & Corrective action :

1. Corrupted superblock – fsck fails to run
If the superblock is corrupted the file system still can be repaired using alternate superblock which are formed while making new file system .
the first alternate superblock number is 32 and others superblock numbers can be found using the following command :
newfs -N /dev/rdsk/c0t0d0s6
for example to run fsck using first alternate superblock following command is used
fsck -F ufs -o b=32 /dev/rdsk/c0t0d0s6

2. Link counter adjustment
fsck finds mismatch between directory inode link counts and actual directory links and prompts for adjustment in case of interactive operation. Link count adjustments are considered to be a safe operation in a file system and should be repaired by giving ‘y’ response to the adjust ? prompt during fsck.

3. Free Block count salvage
During fsck the number of free blocks listed in a superblock and actual unallocated free blocks count does not match. fsck inform this mismatch and asks to salvage free block count to synchronize the superblock count. This error can be corrected without any potential problem to the file system or files.

4. Unreferenced file reconnection
While checking connectivity fsck finds some inodes which are allocated but not referenced – not attached to any directory . Answering y to reconnect message by fsck links these files to the lost+found directory with their inode number as their name .
To get more info about the files in lost+found ‘file’ command can be used to see the type of files and subsequently they can be opened in their applications or text editors to find out about their contents. If the file is found to be correct it can be used after copying to some other directory and renaming it.

Booting Process Explained

Booting Process in Solaris

Understanding the booting process is important in the sense that you can get a clear idea when a system faces a booting problem if you are familiar with the booting sequence and steps involved. You can thereby isolate a booting phase and quickly resolve the issues.
Booting process in Solaris can be divided in to different phases for ease of study . First phase starts at the time of switching on the machine and is boot prom level , it displays a identification banner mentioning machine host id serial no , architecture type memory and Ethernet address This is followed by the self test of various systems in the machine.
This process ultimately looks for the default boot device and reads the boot program from the boot block which is located on the 1-15 blocks of boot device. The boot block contains the ufs file system reader which is required by the next boot processes.

The ufs file system reader opens the boot device and loads the secondary boot program from /usr/platform/`uname –i`/ufsboot ( uname –i expands to system architecture type)
The boot program above loads a platform specific kernel along with a generic solaris kernel
The kernel initialize itself and load modules which are required to mount the root partition for continuing the booting process.

The booting process undergoes the following phases afterwards :
1) init phase
2) inittab file
3) rc scripts & Run Level

1. INIT phase
Init phase is started by the execution of /sbin/init program and starts other processes after reading the /etc/inittab file as per the directives in the /etc/inittab file .
Two most important functions of init are
a) It runs the processes to bring the system to the default run level state ( Run level 3 in Solaris , defined by initdefault parameter in /etc/inittab )
b) It controls the transition between different run levels by executing appropriate rc scripts to start and the stop the processes for that run level.

2. /etc/inittab file
This file states the default run level and some actions to be performed while the system reaches up to that level. The fields and their explanation are as follows :
S3:3:wait:/sbin/rc3 > /dev/console 2>&1 < /dev/console
S3 denotes a identification if the line
3 is run level
wait is action to be performed
/sbin/rc3 is the command to be run.
So the fields in the inittab are
Identification : run level : action : process
The complete line thus means run the command /sbin/rc3 at run level 3 and wait until the rc3 process is complete.
The action field can have any of the following keywords :
Initdefault : default run level of the system
Respawn : start and restart the process if it stops.
Powerfail : stop on powerfail
Sysinit : start and wait till console in accessible .
Wait : wait till the process ends before going on to the next line.

3. RC scripts & Run Levels
Rc scripts performs the following functions :
a) They check and mount the file systems
b) Start and stop the various processes like network , nfs etc.
c) Perform some of the house keeping jobs.
System goes in to one of the following run level after booting depending on default run level and the commands issued for changing the run level to some other one.
0 Boot prom level ok> or > prompt in Sun.
1 Administrative run level . Single user mode
2 Multiuser mode with no resource sharing .
3 Multiuser level with nfs resource sharing
4 Not used
5 Shutdown & power off (Sun 4m and 4u architecture )
6 Reboot to default run level
S s Single user mode user logins are disabled.
Broadly speaking the running system can be in any of the folloing state
Single user – Minimum processes running , user logins disabled and root password is required to gain access to the shell .
Multiuser - All system processes are running and user logins are permitted
Run level of a desired state is achieved by a number of scripts executed by the rc program the rc scripts are located in /etc/rc0.d , /etc/rc1.d , /etc/rc2.d , /etc/rc3.d & /etc/rcS.d directories . All the files of a particular run level are executed in the alphanumeric order .Those files beginning with letter S starts the processes and those beginning with K stops the processes.
These files are hard linked to the files in /etc/init.d in order to provide a central location for all these files and eliminating the need to change the run level in case these scripts needs to be run separately . The files in /etc/init.d directory are without any S , K and numeric prefix instead a stop / start argument has to be supplied whenever these scripts are to be executed .
By default system has a number of rc scripts needed for run level transition but sometimes it becomes necessary to start some custom scripts at the booting time and turn them off at the shutdown . Custom scripts can be put in any of the required rc directory but following major considerations has to be kept in mind :
* The sequence number of the file should not conflict with other files.
* The sevices needed should be available by previously executed scripts.
* File should be hard linked to the /etc/init.d directory .
* The system looks for only those files beginning with letter K & S , any thing else is ignored , therefore, to make a file inactive simply changing uppercase K or S to lower case will cause system to ignore it .

Thursday, June 9, 2011

Boot Procedure in Solaris

Booting process inside solaris SPARC architecture is done with the help of middleware layer known as firmware. This firmware is known as OBP (Open Boot Prompt), this layer is used to manage the H/W independent of Operating System. The entire information of OBP is stored inside 1 Mb Chip known as as Prom (Programmable read only memory) Chip.

The parameters stored inside the chip are known as NVRAM(Non Volatile Random access memory) parameters. We can set the parameters such as booting sequence, identifying devices, set alias names for device and also provide security for devices.

Boot Process:

- Boot Prom

- Boot Programme

- Kernel initialization

- Init phase

- Svc.startd (SMF)

1. Boot Prom: In this phase it loads the primary boot block known as boot block on to the memory.

Backend:

a. When the server is powered on the PROM run POST (Power on self test).

b. POST identifies all the devices and boot prom identifies bootable devices with the help of NVRAM parameters “boot-device”

c. It reads the VTOC(Volume Table of contents) of boot device and reads the slice number zero from 1-15 sectors and loads the boot on the 16^th memory.

d. At the time of initialization, installed boot programme will load boot block into 1-15^th sector of root (/)

2. Boot Programme:

This loads the secondary boot known as UFS book.

a. In this phase the boot block will load the secondary boot block known as UFS boot on to memory.

b. The location of UFS boot is stored inside bootblock by installed boot programme.

c. UFS boot contains two parts of kernel knows as UNIX and GENUNIX which is platform dependant and independent respectfully.

d. UFS boot combine these tow part kernel into a single running kernel.

3. Kernel Initialization: It reads the file /etc/system to get boot parameters.

a. In this phase the UFS boot will start the kernel and the kernel will read the file /etc/system file.

b. The kernel cannot load by itself and hence takes the help of UFS boot till it loads the root (/) module.

c. Once root module is loaded it starts booting by itself.

d. In this phase the kernel will start the first phase of process known as init process with process ID 1.

4. /etc/inittab (init phase):

a. In this phase the first process called init process will read the file /etc/inittab.

b. Till Solaris 9, system in this phase use to come with the help of run level.

c. Run levels define the system state of reach.

d. From Solaris 10 onwards this file does not contain any run leve but this will start the master starter and restarter daemon known as svcstartd. This daemon will enable the main configuration daemon svcconfigd. This will read repository database to start the services.

RUN LEVELS In Solaris 10

0 ----------- Ok Boot prompt

s or S ----------- Single user with only critical file system.

1 ----------- Single user with all filesystems mounted.

2 ----------- Multiuser without NFS

3 ----------- Multiuser with NFS

4 ----------- Reserved

5 ----------- Shutdown or power off

6 ----------- Reboot

#init is the command to change from one run level to another . The major difference Init s and Init 1 is – when we change from run level 3 to run level s and go back to run level 3, the users will be automatically connected. But in the case of run level 1 the users will have to manually connect to the server.

To know a run level - #who –r
It has four fields

a. Current runlevel

b. Date and time

c. The current runlevel

d. The no of times at this run-level since the last reboot.

e. Previous runlevel

/etc/inittab file contain four fields

a. ID

b. Rstate (One or more runlevels to which this entry applies)

c. Action (How the process to be treated)

d. Process ( The command or the script to be scripted)

When we change from one run level to another runlevel or while booting the system to the appropriate runlevel it executes the runlevel scripts which are stored under the /etc/ directory.

#cd /etc

#ls | grep rc

This rc#.d directory contains scripts which start with ‘S’ knows as starting and ‘K’ known as killing scripts.

#cd rcs.d

#ls

When ever we mention the default runlevel as 3. The sysinit entry in the /etc/inittab will execute following commands

/sbin/autopush

/sbin/soconfig

/sbin/rcS

The ‘R’ state inside the inittab will execute

/sbin/rc2

/sbin/rc3

From Solaris 10 onwards the system will not boot with the help of runlevels, it boots with the help of milestone.

A milestone is defined as a system state of reach.

A repository database is an configurations database which maintain the information of services and how to manage these services.
Repository DB contains default milestone information with which it boots the server. It increases the performance while booting, as it starts all the services parallel.

Important points to be remembered:

Path of bootblock in X86

#cd /usr/platform/i86pc/lib/fs/ufs

In SPARC machines

# cd /usr/platform/sun4c/lib/fs/ufs

How to install boot block – Login to a system in single user mode

Ok> boot cdrom –s
* Go to a location (Root ‘/’ file)

Installboot bootblk /dev/rdsk/c0t0d0s0

The booting process of Solaris has changed from Solaris 10 2006 release onwards. It has been redesigned in such a fashion to have a common boot process between SPARC and X86 architecture. As a part of redesign the Solaris boot archives and bootadm commands are introduced in both architectures.

The primary difference between X86 and SPARC is, how the boot device and the files are selected at boot time.

The SPARC based platforms continue to use OBP as primary administrative interface with boot-options selected by using OBP commands. But in X86 architecture the options are selected through BIOS and GRUB.

In order to understand the new device we have to know bootup and shutdown terminal.

Tuesday, June 7, 2011

VCS Brief Notes. Veritas Cluster Service.

VERITAS CLUSTER

- Cluster is a combination of multiple nodes connected inside a network to provide high availability to the application at any point of time.

- There are two types of configurations asymmetric and symmetric.

- symmetric configuration is also known as active passive configuration at any point of time one node will be active and another node will be passive.

- n symmetric configuration both nodes are active, each performing its own task if one node fails the other will take the responsibility for running all the services.

- e Service groups inside cluster perform 1)fail over 2) parallel 3) hybrid operations,

- e two daemons in a cluster are:

1)HAD daemon and 2)HA shadow daemon, these two daemons are always running in all the nodes of a cluster.

- he entire configuration of the cluster is managed by HAD daemon HA shadow acts like backup and also perform load balance.

- There are two main configuration files inside cluster

Main.cf:- which maintains the cluster id the list of users can manage cluster and the no of service groups created inside the cluster.

Types.cf:- this file maintains default attributes information

- The primary task of HAD daemon is to maintain same copy of these files inside the cluster it also maintain backup of these file before performing any modifications to the cluster.

- The communication between the cluster is done with the help of high priority links used for private communication, it uses LLT (low latency transfer protocol)and GAB(group membership atomic broadcast protocol) protocols to communication b/w the nodes.

- These files are responsible in maintaining the communication information inside the node.

/etc/llthosts:- contains node ID and node name.

/etc/llttab:- contains cluster ID, node id and the no.of links used for communication b/w the nodes.

/etc/gabtab:- maintains the membership information of nodes inside the cluster.

Installation requirements for cluster:-

Min 512mb ram,NIC cards for private public communication and common storage

Appilication Requirements to install on a cluster:-

1)It should have independent stop start procedures

2)It should have ability to clear falls and individual monitoring service

3) must have capable of running on multiple nodes should support common storage

*hastatus –sum –status of a cluster

*hagrp –list -list service groups

Start and stop daemons

cd /etc/rc3.d - use to start and stop daemons

#hastatrt

#hastop -is use to stop our local node

#hastop –all

How to stop the remote node

#hastop –sys sys1 –evacuate

Taking a snapshot of vcs configuration files:--

#hasnap –backup –m “test backup of nfssg”

Managing users inside vcs:-

#hauser –list (will list users of vcs)

#hauser –display (will display users along vth priviliges)

#hauser –add user1 (crate a user)

#hauser –addpriv user1 Administrator (to assign a priviliges to uer)

#hauser –delpriv user1 Administrator (to remove priviliges)

#hagrp –list

#hagrp –state -to know the status of service group

#hagrp –resources nfssg

Quering resources types:-

#hatype –resources ip

#hatype –list

#haclus –list

#haclus –display

#haclus -state