Oracle DBA Blogs

Tuesday, November 12, 2013

How to sync system and database time stamp

This post belongs to if there any change in time stamp of system and database. For i.e

SQL>select to_char(sysdate,'dd-mon-yyyy') from dual;

Output of this query will show the database time stamp

SQL > !date

output of this query will show the system time stamp

If output of both above queries are same then all is well otherwise follow the below steps to sync the time stamp:

1. cd $GRID_HOME/crs/install
2. cp -s s_crsconfig_HOSTNAME.txt s_crsconfig_hostname.txt_bkp ( backup the original file)
3. replace the TZ=newtimezone (whatever you want for DB) in s_crsconfig_HOSTNAME.txt
4. Take the output of crsctl stat res -t
5. crsctl stop has ( from root user)
6. crsctl start has ( from root user).....
7. After HAS starts check the output of crsctl stat res -t to see if all oracle related services and all targets are up and running .
8. Connect to database remotely and see if the desired time stamp is displaying as per server .

Hope this works for you :)

Thanks,

Monday, February 7, 2011

Oracle 10g CRS troubleshooting tips and tricks

A very good article @ http://www.ritzyblogs.com/OraTalk/PostID/113/Oracle-10g-CRS-troubleshooting-tips-and-tricks: same is as below :

Oracle 10g CRS troubleshooting tips and tricks

Whenever a node is having issues joining the cluster back post reboot, here is a quick check list I would suggest:

 /var/log/messages

 ifconfig

 ip route

 /etc/hosts

 /etc/sysconfig/network-scripts/ifcfg-eth*

 ethtool

 mii-tool

 cluvfy

 $ORA_CRS_HOME/log

Let us now take a closer look at specifc issues with examples and steps taken for their resolution.

These are all tested on Oracle 10.2.0.4 database on RHEL4 U8 x-64

1. srvctl not able to start Oracle Instance but sqlplus able to start

a. Check racg log for actual error message.

% more $ORACLE_HOME/log/`hostname -s`/racg/ora.{DBNAME}.{INSTANCENAME}.inst.log

b. Check if srvctl is configured to use correct parameter file(pfile/spfile)

% srvctl config database -d {DBNAME} -a

You can also validate parameter file by using sqlplus to see the exact error message.

c. Check ownership for $ORACLE_HOME/log

If this is owned by root, srvctl won't be able to start instance as oracle user.

# chown -R oracle:dba $ORACLE_HOME/log

2. VIP has failed over to another node but is not coming back to the original node

Fix: The node where the VIP has failed over, bring it down manually as root

Example: ifconfig eth0:2 down

PS: Be careful to bring down only VIP. A small typo may bring down your public interface:)

3. Moving OCR to a different location

PS: This can be done while CRS is up as root.

While trying to change ocr mirror or the ocr to a new location, ocrconfig complaints.

The fix is to touch the new file.

Example:

# ocrconfig -replace ocrmirror /crs_new/cludata/ocrfile

PROT-21: Invalid parameter

# touch /crs_new/cludata/ocrfile

# chown root:dba /crs_new/cludata/ocrfile

# ocrconfig -replace ocrmirror /crs_new/cludata/ocrfile

Verify:

a. Validate using "ocrcheck". Device/File Name should point to the new one with integrity check succeeded.

b. Ensure OCR inventory is updated correctly

# cat /etc/oracle/ocr.loc

ocrconfig_loc and ocrmirrorconfig_loc should point to correct locations.

4. Moving Voting Disk to a different location

PS: CRS must be down while moving the voting disk.

The idea is to add new voting disks and delete the older ones.

Find below sample errors and their fix.

# crsctl add css votedisk /crs_new/cludata/cssfile_new

Cluster is not in a ready state for online disk addition

We need to use force option. However, before using force option, ensure CRS is down.

If CRS is up, DO NOT use force option else it may corrupt your OCR.

# crsctl add css votedisk /crs_new/cludata/cssfile_new -force

Now formatting voting disk: /crs_new/cludata/cssfile_new

successful addition of votedisk /crs_new/cludata/cssfile_new.

Verify using "crsctl query css votedisk" and then delete the old votedisks.

While deleting too, you'll need to use force option.

Also verify the permissions of the voting disk files. It should be oracle:dba

If voting disks were added using root, the permission should be changed to oracle:dba

5. Manually registering listener resource to OCR

Listener was registered manually with OCR but srvctl was unable to bring up the listener

Let us first see example of how to manually do this.

From an existing available node, print the listener resource

% crs_stat -p ora.test-server2.LISTENER_TEST-SERVER2.lsnr > /tmp/res

% cat /tmp/res

NAME=ora.test-server2.LISTENER_TEST-SERVER2.lsnr

TYPE=application

ACTION_SCRIPT=/orahome/ora10g/product/10.2.0/db_1/bin/racgwrap

ACTIVE_PLACEMENT=0

AUTO_START=1

CHECK_INTERVAL=600

DESCRIPTION=CRS application for listener on node

FAILOVER_DELAY=0

FAILURE_INTERVAL=0

FAILURE_THRESHOLD=0

HOSTING_MEMBERS=test-server2

OPTIONAL_RESOURCES=

PLACEMENT=restricted

REQUIRED_RESOURCES=ora.test-server2.vip

RESTART_ATTEMPTS=5

SCRIPT_TIMEOUT=600

START_TIMEOUT=0

STOP_TIMEOUT=0

UPTIME_THRESHOLD=7d

USR_ORA_ALERT_NAME=

USR_ORA_CHECK_TIMEOUT=0

USR_ORA_CONNECT_STR=/ as sysdba

USR_ORA_DEBUG=0

USR_ORA_DISCONNECT=false

USR_ORA_FLAGS=

USR_ORA_IF=

USR_ORA_INST_NOT_SHUTDOWN=

USR_ORA_LANG=

USR_ORA_NETMASK=

USR_ORA_OPEN_MODE=

USR_ORA_OPI=false

USR_ORA_PFILE=

USR_ORA_PRECONNECT=none

USR_ORA_SRV=

USR_ORA_START_TIMEOUT=0

USR_ORA_STOP_MODE=immediate

USR_ORA_STOP_TIMEOUT=0

USR_ORA_VIP=

Modify relevant parameters in the resource file to point to correct instance.

Rename as resourcename.cap

% mv /tmp/res /tmp/ora.test-server1.LISTENER_TEST-SERVER1.lsnr.cap

Register with OCR

% crs_register ora.test-server1.LISTENER_TEST-SERVER1.lsnr -dir /tmp/

Start listener

% srvctl start listener -d testdb -n test-server1

While trying to start listener, srvctl is throwing errors like "Unable to read from listener log file"

The listener log file exists.

If resource is registered using root, then srvctl won't be able to start using oracle user.

So all the aforementioned operations while registering the listener manually should be done using oracle user.

6. Services

While checking status of a service, it says "not running"

If we try to start it using srvctl, the error message is "No such service exists" or "already running"

If we try to add service with same name, it says "already exists"

This happens because the service is in an "Unknown" state in the OCR

Using crs_stat, check if any related resource for service(resource names ending with .srv and .cs) is still lying around.

srvctl remove service -f has been tried and the issue persists.

Here is the fix:

# crs_stop -f {resourcename}

# crs_unregister {resourcename}

Now service can be added and started correctly.

7. Post host reboot, CRS is not starting

After host reboot, CRS was not coming up. No CRS logs in $ORA_CRS_HOME

Check /var/log/messages

"Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.9559"

No logs seen in /tmp/crsctl.*

Run cluvfy to identify the issue

$ORA_CRS_HOME/bin/cluvfy stage -post crsinst -n {nodename}

/tmp was not writable

/etc/fstab was incorrect and was fixed for making /tmp available

If you see messages like "Shutdown CacheLocal. my hash ids don't match" in the CRS log, then

check if /etc/oracle/ocr.loc is same across all nodes of the cluster.

8. CRS binary restored by copying from existing node in the cluster

CRS not starting with following messages in /var/log/messages;

"Id "h1" respawning too fast: disabled for 5 minutes"

CRSD log showing "no listener"

If CRS binary is restored by copying from existing node in the cluster, then you need to ensure:

a. Hostnames are modified correctly in $ORA_CRS_HOME/log

b. You may need to cleanup socket files from /var/tmp/.oracle

PS:Exercise caution while working with the socket files. If CRS is up, you should never touch those files otherwise reboot may be inevitable.

9. CRS rebooting frequently by oprocd

Check /etc/oracle/oprocd/ and grep for "Rebooting".

Check /var/log/messages and grep for "restart"

If the timestamps are matching, this confirms reboots are being initated by oprocd process.

%ps -ef
grep oprocd

root 10409 9937 0 Feb27 ? 00:00:00 /oracle/product/crs/bin/oprocd.bin run -t 1000 -m 500 -f

-t 1000 means oprocd would wake up every 1000ms

-m 500 means allow upto 500ms margin of error

Basically with these options if oprocd wakes up after > 1.5 secs it’s going to force a reboot.

This is conceptually analogous to what hangcheck timer used to do pre 10.2.0.4 Oracle releases on Linux.

Fix is to set CSS diagwait to 13

#crsctl set css diagwait 13 -force

# /oracle/product/crs/bin/crsctl get css diagwait

13

This actually changes what parameters oprocd runs with

%ps -ef
grep oprocd

root 10409 9937 0 Feb27 ? 00:00:00 /oracle/product/crs/bin/oprocd.bin run -t 1000 -m 10000 -hsi 5:10:50:75:90 -f

Note that the margin has now changed to 10000ms i.e 10 seconds in place of the default 0.5 seconds.

PS: Setting diagwait requires a full shutdown of Oracle Clusterware on ALL nodes.

10. Cluster hung. All SQL queries on GV$ views are hanging.

Alert log from all instance have message like below:

INST1: IPC Send timeout detected. Receiver ospid 1650

INST2:IPC Send timeout detected.Sender: ospid 24692

Receiver: inst 1 binc 150 ospid 1650

INST3: IPC Send timeout detected.Sender: ospid 12955

Receiver: inst 1 binc 150 ospid 1650

The ospid on all instances belong to LCK0 - Lock Process

In case of inter-instance lock issues, it's important to identify the instance from where it's initiating.

As seen from above, INST1 is the one that needs to be fixed.

Just identify the process that is causing row cache lock and kill it otherwise reboot node 1.

11. Inconsistent OCR with invalid permissions

% srvctl add db -d testdb -o /oracle/product/10.2

PRKR-1005 : adding of cluster database testdb configuration failed, PROC-5: User does not have permission to perform a cluster registry operation on this key. Authentication error [User does not have permission to perform this operation] [0]

crs_stat doesn't have any trace of it so utilities like crs_setperm/crs_unregister/crs_stop won't work in this case.

ocrdump shows:

[DATABASE.LOG.testdb]

UNDEF :

SECURITY : {USER_PERMISSION : PROCR_ALL_ACCESS, GROUP_PERMISSION : PROCR_ALL_ACCESS, OTHER_PERMISSION : PROCR_READ, USER_NAME : root, GROUP_NAME : root}

[DATABASE.LOG.testdb.INSTANCE]

UNDEF :

SECURITY : {USER_PERMISSION : PROCR_ALL_ACCESS, GROUP_PERMISSION : PROCR_ALL_ACCESS, OTHER_PERMISSION : PROCR_READ, USER_NAME : root, GROUP_NAME : root}

These logs are owned by root and that's the problem.

This means that the resource was perhaps added into OCR using root.

Though it has been removed by root but now it cannot be added by oracle user unless we get rid of the aforementioned.

Shutdown the entire cluster and either restore from previous good backup of OCR using:

ocrconfig -restore backupfilename

You can get list of backups using:

ocrconfig -showbackup

If you are not sure of last good backup, there you can also do the following:

Take export backup of OCR using:

ocrconfig -export /tmp/export -s online

Edit /tmp/export and remove those 2 lines pointing to DATABASE.LOG.testdb and DATABASE.LOG.testdb.INSTANCE owned by root

Import it back now

ocrconfig -import /tmp/export

After starting the cluster, verify using ocrdump.

The OCRDUMPFILE should not have any trace of those leftover log entries owned by root

Tuesday, January 11, 2011

Missing logs, Out-of SYNC (Physical standby)

Auto Sync, some time auto sync not happens result of which standby lag behinds, some steps below can make standby to sync with primary:

1.

On standby check

1. Select process,status from v$managed_standby; and this query should return mrp as process, if mrp is not coming means recovery process Is not active.

2. Try recover managed standby database cancel; and again try recover managed standby database using current logfile disconnect from session.

At primary check :

3. select current_scn from v$database;

4. rman > backup incremental from scn scn_number database format ‘/tmp/forSTANDBY_%U’ tag FORSTANDBY;

At standby :

Do recovery at standby as :

5. Rman nocatalog target /

6. RMAN> catalog backuppiece ‘/tmp/forSTANDBY_%U’;

7. RMAN> recover database noredo;

8. RMAN> delete backup tag ‘FORSTANDBY’;

9. Alter database managed standby database using current logfile disconnect from session;

10. Select process,status from v$managed_standby;

11. Select sequence#,applied from v$archived_log;

12. Select max(sequence#) from v$log and match if it is same as of primary;

2. Copy controlfile from primary to standby :

At primary :

1. RMAN>backup current controlfile for standby;

or

2. RMAN> backup current controlfile for standby format ‘/tmp/controflile.ctl’;

At standby :

3. RMAN>restore standby controlfile from ‘/tmp/controlfile.ctl’;

4. Sql> shutdown immediate

5. SQL> startup nomount

6. SQL> alter database mount standby database;

7. SQL> recover managed standby database disconnect from session;

8. Check everything running fine : if yes then perfect otherwise :

9. SQL>select name from v$datafile ; (check all datafiles )

10. RMAN> set newname for datafile “datafile_name” to “standby_location”; or

11. RMAN> catalog start with '+diskgroup//datafile/';

12. SQL> alter database add standby logfile group 4 ‘+standby_location’ size 50M; (add same no of standby logfiles as of online redo log files).

13. SQL>shutdown immediate

14. SQL>startup nomount

15. SQL>alter database mount standby database;

16. RMAN>recover database ; (check incarnation and scn should be same as of primary)

17. SQL>recover managed standby database using current logfile disconnect from session;

18. Check mrp is running or not.

19. Check log sequence# on both databases.

3. Register logfiles:

1. Locate the missing log file using SQL>SELECT SEQUENCE#, APPLIED FROM V$ARCHIVED_LOG;

2. Copy all the missing log files from primary archive destination to standby archive destination;

3. SQL>Alter database register logfile ‘log_file_path’;

4. Now check status of mrp process.

Monday, January 10, 2011

Max Protection vs Max Availability (Data Guard)

1. Whether log_archive_dest_2 is in lgwr,affirm or sync mode ,no matters about the above parameter , by default the protection mode would be max performance.

2. Max Availability will behave like max performance mode when no standby archive destinations available.
3. Max Availability will remain in max availability mode when standby redo logs will not available , in this case archiver process will come into picture.

4. Max protection mode will shutdown automatically if no standby destinations available for log transfer.for this we need atleast 2 log_archive_dest in order to aviod primary instance shutdown;

5. Max protection mode will shutdown if lgwr,async and noaffirm is used.

6. Max availability mode will behave like max performance mode if lgwr,async aand noaffirm is used , also it will use archiver (arch) process instead of lgwr.

7. Max availability mode will use archiver process for atleast gurantee the sync .

8. Max performance will use lgwr incase of lgwr,async,noaffirm instead of arch (in max availability ).

Monday, November 22, 2010

ORA-31634: job already exists

Error : Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - Production
With the Partitioning, Real Application Clusters, OLAP and Data Mining options
ORA-31634: job already exists
ORA-31664: unable to construct unique job name when defaulted

tail: cannot open input
Compressing exportfile..

Solution :
1. select owner_name,job_name,operation,state from DBA_DATAPUMP_JOBS ;
2. select owner_name,job_name,operation,state from DBA_DATAPUMP_JOBS where owner_name=’SCHEMA_NAME’; and output like
SCHEMA_NAME SYS_EXPORT_SCHEMA_60 EXPORT NOT RUNNING
SCHEMA_NAME SYS_EXPORT_SCHEMA_03 EXPORT NOT RUNNING
3. If output count of the above query Is 99 then
4. Either change the script and execute it using user "SYSTEM" instead of provided SCHEMA_NAME or connect to the provided schema and drop table SYS_EXPORT_SCHEMA_01; to drop table SYS_EXPORT_SCHEMA_99; (use the below script
5. select 'DROP TABLE '||owner||'.'||table_name||';'
from dba_tables
where table_name like '%SYS%EXPORT%';

'DROPTABLE'||OWNER||'.'||TABLE_NAME||';'
-------------------------------------------------------------------------
DROP TABLE SYSTEM.SYS_EXPORT_FULL_01;

6. Re-execute the script.

The reason was : if the enteries in DBA_DATAPUMP_JOBS table become equals to 99 for any particular schema then "ORA-31634 : job already exists" occurs so make sure the entries must be below 99 in order to avoid above error .

Monday, August 16, 2010

Migrating to ASM Using RMAN

Migrating to ASM Using RMAN
The following method shows how a primary database can be migrated to ASM from a disk based backup:
Disable change tracking (only available in Enterprise Edition) if it is currently being used.
SQL> ALTER DATABASE DISABLE BLOCK CHANGE TRACKING;
Shutdown the database.
SQL> SHUTDOWN IMMEDIATE
Modify the parameter file of the target database as follows:
Set the DB_CREATE_FILE_DEST and DB_CREATE_ONLINE_LOG_DEST_n parameters to the relevant ASM disk groups.
Remove the CONTROL_FILES parameter from the spfile so the control files will be moved to the DB_CREATE_* destination and the spfile gets updated automatically. If you are using a pfile the CONTROL_FILES parameter must be set to the appropriate ASM files or aliases.
Start the database in nomount mode.
RMAN> STARTUP NOMOUNT
Restore the controlfile into the new location from the old location.
RMAN> RESTORE CONTROLFILE FROM 'old_control_file_name';
Mount the database.
RMAN> ALTER DATABASE MOUNT;
Copy the database into the ASM disk group.
RMAN> BACKUP AS COPY DATABASE FORMAT '+disk_group';
Switch all datafile to the new ASM location.
RMAN> SWITCH DATABASE TO COPY;
Open the database.
RMAN> ALTER DATABASE OPEN;
Create new redo logs in ASM and delete the old ones.

Enable change tracking if it was being used.
SQL> ALTER DATABASE ENABLE BLOCK CHANGE TRACKING;

Sunday, August 15, 2010

RMAN Backup Encryption

Oracle Database RMAN Backup Encryption

Purpose: To improve the Oracle backup security:

RMAN backups created as backup sets can be encrypted. Encrypted backups cannot be read if they are obtained by unauthorized people.

The minimum database version should be 10.2.0.

V$RMAN_ENCRYPTION_ALGORITHMS view contains a list of encryption algorithms supported by RMAN. If no encryption algorithm is specified, the default encryption algorithm is 128-bit AES.

RMAN Backup Encryption Modes:
1. Transparent Encryption of Backups
2. Password Encryption of Backups
3. Dual Mode Encryption of Backups

This concept of encryption/decryption also applies to Oracle RMAN (recovery manager) backup and recovery. The information needs to be encrypted using AES which encrypts data in 128,192,256 bits. It is necessary to set the encryption algorithm and supply password during recovery

RMAN> set encryption algorithm 'aes256' identified by password;
executing command: SET encryption
using target database control file instead of recovery catalog

RMAN> backup as compressed backupset database format \backup\oracle_backup_encryption.bkp';
Starting backup at 16-AUG-10
using channel ORA_DISK_1
channel ORA_DISK_1: starting compressed full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
.....
RMAN> set decryption identified by password;