Friday, November 29, 2013

PostgreSQL for Oracle DBAs


Introduction

The following article contains information to help an Oracle DBA understand some terms and the management of a PostgreSQL database. This article is intended to be an introduction to PostgreSQL, not a tutorial or a complete definition of how to administer a PostgreSQL database. For complete documentation refer to the PostgreSQL manuals.

Oracle

Brief description:

  • An Oracle database server consists of an Oracle instance and an Oracle database.
  • An Oracle instance consists of the Oracle background processes and the allocated memory within the shared global area (SGA) and the program global area (PGA).
  • The Oracle background processes consist of the following:
    • Database Writer Process (DBWn)
    • Log Writer Process (LGWR)
    • Checkpoint Process (CKPT)
    • System Monitor Process (SMON)
    • Process Monitor Process (PMON)
    • Recoverer Process (RECO)
    • Archiver Processes (ARCn)
  • An Oracle database consists of the database datafiles, control files, redo log files, archive log files, and parameter file.
  • To remotely access an Oracle database, there exists a separate process referred to as the Oracle listener.
  • In the Dedicated Server configuration (versus the Shared Server configuration) every established database session has its own process executing on the server.
To keep things simple any comparisons with an Oracle database will always refer to a single instance managing a single database, RAC and Data Guard will not be mentioned. Note: PostgreSQL also has the concept of a warm standby (since 8.2) with the shipping of archive logs (introduced in 8.0).

PostgreSQL

Database Server Processes

The database server program postgres are all of the server processes. There are no separately named processes like in Oracle for the different duties within the database environment. If you were to look at the process list (ps) the name of the processes would be postgres. However, on most platforms, PostgreSQL modifies its command title so that individual server processes can readily be identified. You may need to adjust the parameters used for commands such as ps and top to show these updated titles in place of the process name ("postgres").
The processes seen in a process list can be some of the following:
  • Master process - launches the other processes, background and session processes.
  • Writer process - background process that coordinates database writes, log writes and checkpoints.
  • Stats collector process - background process collecting information about server activity.
  • User session processes.
The server processes communicate with each other using semaphores and shared memory to ensure data integrity throughout concurrent data access.

PostgreSQL Database Cluster

Within a server, one or more Oracle instances can be built. The databases are separate from one another usually sharing only the Oracle listener process. PostgreSQL has the concept of a database cluster. A database cluster is a collection of databases that is stored at a common file system location (the "data area"). It is possible to have multiple database clusters, so long as they use different data areas and different communication ports.
The processes along with the file system components are all shared within the database cluster. All the data needed for a database cluster is stored within the cluster's data directory, commonly referred to as PGDATA (after the name of the environment variable that can be used to define it). The PGDATA directory contains several subdirectories and configuration files.
The following are some of the cluster configuration files:
  • postgresql.conf - Parameter or main server configuration file.
  • pg_hba.conf - Client authentication configuration file.
  • pg_ident.conf - Map from OS account to PostgreSQL account file.
The cluster subdirectories:
  • base - Subdirectorycontaining per-database subdirectories
  • global - Subdirectory containing cluster-wide tables
    • pg_auth - Authorization file containing user and role definitions.
    • pg_control - Control file.
    • pg_database - Information of databases within the cluster.
  • pg_clog - Subdirectory containing transaction commit status data
  • pg_multixact - Subdirectory containing multitransaction status data (used for shared row locks)
  • pg_subtrans - Subdirectory containing subtransaction status data
  • pg_tblspc - Subdirectory containing symbolic links to tablespaces
  • pg_twophase - Subdirectory containing state files for prepared transactions
  • pg_xlog - Subdirectory containing WAL (Write Ahead Log) files
By default, for each database in the cluster there is a subdirectory within PGDATA/base, named after the database's OID (object identifier) in pg_database. This subdirectory is the default location for the database's files; in particular, its system catalogs are stored there. Each table and index is stored in a separate file, named after the table or index's filenode number, which can be found in pg_class.relfilenode.
Several components that Oracle DBAs usually equate to one database are shared between databases within a PostgreSQL cluster, including the parameter file, control file, redo logs, tablespaces, accounts, roles, and background processes.

Tablespaces and Object Data Files

PostgreSQL introduced tablespace management in version 8.0. The physical representation of a tablespace within PostgreSQL is simple: it is a directory on the file system, and the mapping is done via symbolic links.
When a database is created, the default tablespace is where by default all of the database objects are stored. In Oracle this would be similar to the System, User, and Temporary tablespaces. If no default tablespace is defined during creation, the data files will go into a subdirectory of the PGDATA/base. Preferably the location of the system catalog information and the application data structures would reside in separately managed tablespaces. This is available.
As in Oracle, the definition of a PostgreSQL table determines which tablespace the object resides. However, there exists no size limitation except physical boundaries placed on the device by the OS.
The individual table's data is stored within a file within the tablespace (or directory). The database software will split the table across multiple datafiles in the event the table's data surpasses 1 GB.
Since version 8.1, it's possible to partition a table over separate (or the same) tablespaces. This is based on PostgreSQL's table inheritance feature, using a capability of the query planner referred to as constraint exclusion.
There exists no capacity for separating out specific columns (like LOBs) into separately defined tablespaces. However, in addition to the data files that represent the table (in multiples of 1 GB) there is a separation of data files for columns within a table that are TOASTed. The PostgreSQL storage system called TOAST (The Oversized-Attribute Storage Technique) automatically stores values larger than a single database page into a secondary storage area per table. The TOAST technique allows for data columns up to 1 GB in size.
As in Oracle, the definition of an index determines which tablespace it resides within. Therefore, it is possible to gain the performance advantage of separating the disks that a table's data versus its indexing reside, relieving I/O contention during data manipulation.
In Oracle there exists temporary tablespaces where sort information and temporary evaluation space needed for distinct statements and the like are used. PostgreSQL does not have this concept of a temporary tablespace; however it does require storage to be able to perform these activities as well. Within the "default" tablespace of the database (defined at database creation) there is a directory called pgsql_tmp. This directory holds the temporary storage needed for the evaluation. The files that get created within the directory exist only while the SQL statement is executing. They grow very fast, and are most likely not designed for space efficiency but rather speed. Be aware that disk fragmentation could result from this, and there needs to be sufficient space on the disk to support the user queries. With the release of 8.3, there are definitions of temporary tablespaces using the parameter temp_tablespaces.

REDO and Archiving

PostgreSQL uses Write-Ahead Logging (WAL) as its approach to transaction logging. WAL's central concept is that changes to data files (where tables and indexes reside) must be written only after those changes have been logged, that is, when log records describing the changes have been flushed to permanent storage. If we follow this procedure, we do not need to flush data pages to disk on every transaction commit, because we know that in the event of a crash we will be able to recover the database using the log: any changes that have not been applied to the data pages can be redone from the log records. (This is roll-forward recovery, also known as REDO.)
PostgreSQL maintains its (WAL) in the pg_xlog subdirectory of the cluster's data directory.
WAL was introduced into PostgreSQL in version 7.1. To maintain database consistency in case of a failure, previous releases forced all data modifications to disk before each transaction commit. With WAL, only one log file must be flushed to disk, greatly improving performance while adding capabilities like Point-In-Time Recovery and transaction archiving.
A PostgreSQL system theoretically produces an indefinitely long sequence of WAL records. The system physically divides this sequence into WAL segment files, which are normally 16MB apiece. The system normally creates a few segment files and then "recycles" them by renaming no-longer-needed segment files to higher segment numbers. If you were to perform a listing of the pg_xlog directory there would always be a handful of files changing names over time.
To add archiving of the WAL files there exists a parameter within the parameter file where a command is added to execute the archival process. Once this is done, Operation System "on-line" backups even become available by executing the pg_start_backup and the pg_stop_backup commands, which suspend and resume writing to the datafiles while continuing to write the transactions to the WAL files and executing the archival process.
Inclusion of WAL archiving and the on-line backup commands were added in version 8.0.

Rollback or Undo

It is interesting how the dynamic allocation of disk space is used for the storage and processing of records within tables. The files that represent the table grow as the table grows. It also grows with transactions that are performed against it. In Oracle there is a concept of rollback or undo segments that hold the information for rolling back a transaction. In PostgreSQL the data is stored within the file that represents the table. So when deletes and updates are performed on a table, the file that represents the object will contain the previous data. This space gets reused but to force recovery of used space, a maintenance process called vacuum must be executed.

Server Log File

Oracle has the alert log file. PostgreSQL has the server log file. A configuration option would even have the connection information we normally see within the Oracle's listener.log appear in PostgreSQL's server log. The parameters within the server configuration file (postgresql.conf) determine the level, location, and name of the log file.
To help with the maintenance of the server log file (it grows rapidly), there exists functionality for rotating the server log file. Parameters can be set to determine when to rotate the file based on the size or age of the file. Management of the old files is then left to the administrator.

Applications

The command initdb creates a new PostgreSQL database cluster.
The command psql starts the terminal-based front-end to PostgreSQL or SQL command prompt. Queries and commands can be executed interactively or through files. The psql command prompt has several attractive features:
  • Thorough on-line help for both the psql commands and the SQL syntax.
  • Command history and line editing.
  • SQL commands could exist on multiple lines and are executed only after the semi-colon (;).
  • Several SQL commands separated by semi-colons could be entered on a single line.
  • Flexible output formatting.
  • Multiple object description commands that are superior to Oracle's DESCRIBE.
Depending on the security configurations of the environments, connections can be established locally or remotely through TCP/IP. Due to these separate security connections passwords may or may not be required to connect.
The command pg_ctl is a utility for displaying status, starting, stopping, or restarting the PostgreSQL database server (postgres). Although the server can be started through the postgres executable, pg_ctl encapsulates tasks such as redirecting log output, properly detaching from the terminal and process group, and providing options for controlled shutdown.
The commands pg_dump and pg_restore are utilities designed for exporting and importing the contents of a PostgreSQL database. Dumps can be output in either script or archive file formats. The script file format creates plain-text files containing the SQL commands required to reconstruct the database to the state it was at the time it was generated. The archive file format creates a file to be used with pg_restore to rebuild the database.
The archive file formats are designed to be portable across architectures. Historically, any type of upgrade to the PostgreSQL software would require a pg_dump of the database prior to the upgrade. Then a pg_restore after the upgrade. Now, for minor releases (i.e., the third decimal – 8.2.x) upgrades can be done in place. However, changing versions at the first or second decimal still requires a pg_dump/pg_restore.
There exists a graphical tool called pgAdmin III developed separately. It is distributed with the Linux and Windows versions of PostgreSQL. Connection to a database server can be established remotely to perform administrative duties. Because the tool is designed to manage all aspects of the database environment, connection to the database must be through a super user account.
The pgAdmin III tool has the following standard attractive features:
  • Intuitive layout
  • Tree structure for creating and modifying database objects
  • Reviewing and saving of SQL when altering or creating objects

Why we need schema in database?

There are several reasons why one might want to use schemas:
  • To allow many users to use one database without interfering with each other.
  • To organize database objects into logical groups to make them more manageable.
  • Third-party applications can be put into separate schemas so they do not collide with the names of other objects.

What is Scalar Subquery?

Answer: A scalar subquery is an ordinary SELECT query in parentheses that returns exactly one row with one column.

Example:

SELECT fieldname, (SELECT max(fieldname) FROM tableB WHERE tableB.fieldname = tableA.fieldname)
    FROM tableA;

Tuesday, November 12, 2013

PostgreSQL Replication and Hot Standby – In Practice

This article describes how to setup PostgreSQL Replication and Hot Standby on two Server Machines instances running CentOS. I spent good chunk of time learning these tricks in past two weeks, in order to implement High Availability solution for PostgreSQL database server.

The replication technique used here has become available in PostgreSQL only since version 9.2. There are several blog posts and few books that describe it, but they don’t explain everything that is necessary. I found the explanation of these setups in “PostgreSQL 9 Administration Cookbook” very useful. I tried all the replication recipes described in that book on two VMs on my laptop. After going through 7 different database clusters, each used for testing different recipe, I got a good handle on the things.

Setup

The instructions here use CentOS 6.4 64-bit server, with PostgreSQL 9.2.5 (CentOS will do as well with any PostgreSQL version 9.2.x).
We are going to prepare 2 hosts, one master and another standby. That is important if your goal is High Availability. Therefore if one of the host goes down, you can bring up your application servers in another zone where your standby DB server is running. You can also put them in two different regions, but that will cost you more in networking bandwidth.

Preparation

Install following packages on CentOS

postgresql-9.2
postgresql-client-9.2
postgresql-contrib-9.2
 

postgres user account

Installing these packages will automatically create postgres user account.
All following commands are supposed to be executed as postgres user. For security reasons this account doesn’t have any password (like the root account on CentOS). If you are working in test environment you can set a password for postgres with
 
 # passwd postgres

Then perform all following commands by logging in as postgres

#su postgres

If you are on production server though, you may want to leave postgres passwordless. Instead run all following commands with su -u postgres prepended. I am going to omit it from every command for convenience. 


Password-less ssh logins

MASTER server should be able to access STANDBY over ssh without password. More accurately – postgres account on MASTER should be able to access postgres account on STANDBY without password. This is needed for the rsync operation in base-backup step (explained below)
You can do it using ssh-copy-id utility.
On master run
ssh-copy-id <IP_OF_STANDBY_SERVER>

You can also specify user in above command, but I deliberately omitted it because it is recommended that on both machines you do these operations under same user account postgres. Not mentioning the user will make the command login to remote machine with same username as you are currently logged in on this machine (or mentioned in sudo -u).
In case of failover, MASTER server and STANDBY server will be switching roles, therefore run ssh-copy-id from current STANDBY server, so that it can access current MASTER server without password too.

CentOS specific PostgreSQL conventions

On CentOS a PostgreSQL database instance is called cluster. Don’t confuse it with multi-node configuration of servers or even a certain SQL keyword. It’s simply an instance of PostgreSQL server that runs on a particular port and saves its data in its own directory (default being /var/lib/pgsql/9.2/<cluster-name> - also set to PGDATA environment variable in various scripts). CentOS comes with pg_createcluster, pg_ctlcluster, pg_lscluster CLI commands to help you administer these clusters. You will use pg_ctlcluster to start, stop, reload the database instance.
Each cluster also has its own set of configuration files stored in /etc/pgsql/9.2/<cluster-name>. We will need to modify only two of them: pg_hba.conf and postgresql.conf.

 For keeping things simple, it’s better if you ensure that $PGDATA is same on both MASTER and STANDBY servers. (Side note: On production server $PGDATA is expected to be mounted on a separate volume)

Replication with Streaming Log Shipping and Hot Standby

There are many different configurations that you can follow to achieve replication across two PostgreSQL instances. Each has its own benefits and pitfalls. The technique we are going to use is known as ‘Streaming Log Shipping’ and ‘Hot Standby’ in the Admin cookbook I referred to earlier. This setup seems most optimal in terms of immediate replication (i.e. smallest window of data loss) and minimum networking traffic (hence cheaper).
This configuration is of Master-Slave kind. The Slave or STANDBY server can be used to query the database, but you cannot make any writes to it. This configuration is NOT what is known as “Multi Master” configuration. The Master will accept all read and write queries from your application servers. The Standby server will copy the new modifications from Master with minimal delay. It will only be available for read-only queries while in Standby mode. When the Master goes down or we bring it down for other reasons, we can tell the Standby server to become the Master by creating a touch file (explained later). After this point the ex-Standby server will stop following the original Master and will now be ready to accept read-write queries, it will then be the new Master.

Let’s start

If you already have a PostgreSQL database running on production server, that will be your current MASTER server. But note that we are going to configure both MASTER and STANDBY machines as identically as possible, because you will want to make any one of them MASTER in case another goes down and switch back when it comes back up.
Let’s start with current MASTER running and STANDBY stopped.
On the MASTER create replicator user. The STANDBY server will login to the MASTER using this account to read latest changes that it has to replicate. You should not do this step on current STANDBY server, even if it may sometime become a new Master. That is because during the base backup step the replicator user role will get copied to STANDBY server automatically.

psql -c "CREATE USER replicator SUPERUSER LOGIN CONNECTION LIMIT 1 ENCRYPTED PASSWORD 'changeme';"
 
Modify pg_hba.conf on both MASTER and STANDBY server by adding this line. This tells respective PostgreSQL instances to accept connection from other node for replication purposes.
host replication replicator <IP_OF_OTHER_HOST>/32 md5 Modify postgresql.conf on both MASTER and STANDBY servers and add following lines to it. (Check if these options are already set to different values elsewhere in the file)
hot_standby = on max_wal_senders = 1 wal_level = 'hot_standby' archive_mode = on archive_command = 'cd .' listen_addresses = 'localhost,<IP_ADDRESS_OF_THIS_NODE>' At this point restart MASTER. Do NOT start STANDBY yet
Next we are going to perform base backup from MASTER to STANDBY
BACKUP_LABEL="base-backup" psql -p $PORT -c "select pg_start_backup('$BACKUP_LABEL');" rsync -cva --inplace --exclude=*pg_xlog* $PGDATA/ <IP_OF_OTHER_HOST>:$PGDATA/ psql -p $PORT -c "select pg_stop_backup();" It’s advised that you put this in a bash script so that it can be run quickly and repeatedly without any errors.
On STANDBY create a recovery.conf file in $PGDATA directory and add following lines to it.
standby_mode = 'on' primary_conninfo = 'host=<IP_OF_OTHER_HOST> port=$PORT user=replicator password=changeme' trigger_file = '/tmp/postgresql.trigger.$PORT' 

Now start the STANDBY server.
At this point you can check the log file (/var/log/pgsql/postgresql-9.2-<cluster-name>.log) to verify things
  • On STANDBY look for a message “streaming replication successfully connected to primary”.
  • Also you should see “wal receiver process” on STANDBY and “wal sender process” on MASTER.
  • Finally run some queries against the STANDBY server to verify it contains same data as MASTER.

How to do Failover/Switchover

Setting up the system for high availability is of no use, if you are going to wait for actual disaster to strike before trying the recovery. Therefore you should try above steps in test environment and simulate disasters.
If MASTER is not down, make sure you stop it first, before you tell STANDBY to take up that role. This is to avoid the MASTER from processing further queries leading to a split-brain problem.
You can turn the STANDBY into a MASTER by simply touching a trigger file that was mentioned in the recovery.conf, /tmp/postgresql.trigger.$PORT.
Now that STANDBY has turned into MASTER, point your application servers to it. Even if your old MASTER is running at this point, the new MASTER is not going to replicate any changes from it. Therefore it is necessary that you stop the old MASTER before you ask the STANDBY to become the new MASTER.
You can tell that STANDBY has become MASTER from the messages in the log that read “archive recovery complete. database system is ready to accept connections.”

How to do Switchback

After some downtime or maintenance period, your master node is back up again and you want to do switchback. You are going to first turn this node into a standby. In this mode it will catch up with the current MASTER replicating the changes that took place while it was down. So we refer to it as current STANDBY now.
Peform Base backup from current MASTER to current STANDBY
BACKUP_LABEL="base-backup"

psql -p $PORT -c "select pg_start_backup('$BACKUP_LABEL');"
rsync -cva --inplace --exclude=*pg_xlog* $PGDATA/ <IP_OF_OTHER_HOST>:$PGDATA/
psql -p $PORT -c "select pg_stop_backup();" 
 
Create recovery.conf in $PGDATA on current STANDBY
 
 
standby_mode = 'on'
primary_conninfo = 'host=<IP_OF_OTHER_HOST> port=$PORT user=replicator password=changeme'
trigger_file = '/tmp/postgresql.trigger.$PORT'
 
After the catch up is over, you can turn the current STANDBY into MASTER
 by following above switchover procedure – touch the trigger file. 

 



 

failover and switchover in postgresql

Question:

For example, I have two servers. First is the primary server and second is the slave server. I need slave server  to become primary (temp) during primary server is down. Then old primary was fixed. I need old primary server is the primary server and then old slave server (new primary (temp)) to become slave server. Pls help me to explain how solve this problem.

Answer:

After performing a failover, the old slave becomes the new master, but the opposite is not true, that is, the old master does not automatically become the slave. We will need to set up replication from the new master to the to-be slave to be able to switch back.

Asynchronous/Synchronous Streaming Replication in PostgreSQL 9.2

Since, PostgreSQL 9.2 has add more functions and there are lot of new features added in it, so, I thought to blog about each features.
Today, I am discussing about Synchronous and Asynchronous Replication supported in PostgreSQL 9.2.
For Asynchronous Replication, user can use following method:
1. Change Following Parameters in postgresql.conf file of Primary Database:





 - archive_command = cp -i %p /Users/postgres/archive/%f
 - archive_mode = on
 - max_wal_senders = 3 # Maximum 'wal_senders', processes responsible for managing a connection with a standby server
 - wal_keep_segments = # How many WAL segments (=files) should be kept on the primary, whatever may happen (you won't have to copy them manually on the standby if the standby gets too far behind)
 - wal_level = hot_standby
2. Make Following changes in pg_hba.conf

 - host    replication     postgres        [Ipv4 address of Standby Server]/32 trust
 - host    replication     postgres        [Ipv4 address of Master Server]/32  trust
3. Restart the PostgreSQL Cluster using pg_ctl as given below:
 $  
pg_ctl -D [data directory path] restart -m fast
4. Take base Backup of PostgreSQL(Primary) using pg_basebackup command on Standby(This is a new command which has been introduced in PostgreSQL 9.2)

 $ pg_basebackup -D /Users/vibhor/testbackup -v -Fp -l Testbackup -h 127.0.0.1 -U postgres
Sytax of pg_basebackup is given below:

 $ pg_basebackup -D <local directory> -v -Fp -l [backup label] -h [PG server hostname/ip] -U superuser
For more options, user can use following command:
 
pg_basebackup --help
5. Create recovery.conf file and include following parameters:
  
  


restore_command = 'cp -i /Users/postgres/archive/%f %p'     # e.g. 'cp /mnt/server/archivedir/%f %p'
standby_mode = on
primary_conninfo = 'host=localhost port=5432'           # e.g. 'host=localhost port=5432'
trigger_file = '/tmp/makeprimary.trigger'
6. Change following parameters in Postgresql.conf file of Standby:
 
hot_standby=on/off # If you want to use Hot Standby at the same time.
7. Then Start the Standby using following command:
 
pg_ctl -D [standby directory] start.
To verify the about asynchronous replication, use can use following command on primary:

postgres=# select procpid,usesysid, usename, application_name, client_addr, state, sent_location,write_location,sync_state from pg_stat_replication;
 procpid | usesysid | usename  | application_name | client_addr |   state   | sent_location | write_location | sync_state
---------+----------+----------+------------------+-------------+-----------+---------------+----------------+------------
    3661 |       10 | postgres | walreceiver      | ::1         | streaming | 0/D0001D0     | 0/D0001D0      | async
(1 row)
To switch Asynchronous replication to Synchronous Replication,use following steps:
1. Change following parameter in postgresql.conf on Primary Server:

synchronous_standby_names = 'sync_replication'
2. Reload the above changes on primary using following command:

pg_ctl -D [primary data directory] reload
3. Change following parameter in recovery.conf file on Standby:

primary_conninfo = 'host=localhost port=5432 application_name=sync_replication'
4. Restart the standby using following command:
pg_ctl -D [standby directory] restart -m fast
To verify the switch from Asynchronous to Synchronous, user can use following command on primary:

postgres=# select procpid,usesysid, usename, application_name, client_addr, state, sent_location,write_location,sync_state from pg_stat_replication;
 procpid | usesysid | usename  | application_name | client_addr |   state   | sent_location | write_location | sync_state
---------+----------+----------+------------------+-------------+-----------+---------------+----------------+------------
    3830 |       10 | postgres | sync_replication | ::1         | streaming | 0/E000078     | 0/E000078      | sync
(1 row)
PostgreSQL 9.2 also, gives flexibility of controlling Synchronous Replication session wise. So suppose if you want one transaction/session to be replicated as asynchronous, then user can set following parameter in his session on primary:




In Transaction:
BEGIN;
  set synchronous_commit=false;
END;
In Session:

set synchronous_commit=false;