author_profile_image

mariangela

Updated on July 08, 2024

Category:

env

Tags:

ACID

MVCC

Atomicity

Postgresql

psql

psycopg2

python3-dev

libpq-dev

RDBMS

relational database

Install PostgreSQL in your Django application

What is PostgreSQL. What ACID and MVCC mean. How to interact with PostgreSQL databases from the CLI with psql and from your Python application with psycopg2

PostgreSQL is an open-source relational database management system (RDBMS) trusted by many large-scale applications and organizations for various purposes including web applications, data warehousing, and analytics.

Among Postgres’ many properties, it’s worth mentioning the following:

  • PostgreSQL ensures transactions are Atomic, Consistent, Isolated, and Durable (ACID).
  • Supports complex data types such as arrays, hstore (key-value pairs), JSON, and XML.
  • Utilizes Multiversion Concurrency Control (MVCC) to handle multiple transactions simultaneously.
  • Provides robust security features like authentication, authorization, SSL, and advanced role management.

What does ACID mean?

ACID is an acronym that stands for Atomicity, Consistency, Isolation, and Durability. These are the key properties that ensure reliable processing of database transactions.

  • ATOMICITY: each transaction is treated as a single "unit," which either completely succeeds or completely fails. If any part of the transaction fails, the entire transaction is rolled back, and the database state is left unchanged.
  • CONSISTENCY: data written to the database must be valid according to all defined rules, including constraints, cascades, triggers, and any combination thereof.
  • ISOLATION: transactions occur independently without interference from other transactions. The intermediate states of a transaction are invisible to other transactions, and concurrent transactions appear as though they were executed sequentially.
  • DURABILITY: once a transaction has been committed, its results are permanently recorded in the database.

What is Multiversion Concurrency Control (MVCC)?

Multiversion Concurrency Control (MVCC) is a database management technique used to handle concurrent access to the database in a way that avoids conflicts and ensures data consistency.

MVCC allows multiple transactions to occur simultaneously without locking the data, instead MVCC maintains multiple versions of each data item (row); so, when a transaction makes changes to a row, it creates a new version of that row rather than modifying the existing one.

Each transaction sees a snapshot of the database at a particular point in time. This snapshot includes only the committed data as of the start of the transaction, along with any changes made by the transaction itself.

When a transaction commits, its changes are made visible to other transactions that start afterward. If a transaction is rolled back, the changes are discarded, and the database remains as it was before the transaction began.

The database periodically removes old versions of data that are no longer needed, which is known as garbage collection. This process ensures that the database does not grow indefinitely.

Install PostgreSQL

You need to install this only once per Ubuntu distro. If you add a new Ubuntu distro then you’ll have to repeat this step.

sudo apt install postgresql postgresql-contrib

Check permissions

cd /var/lib
ls -la
01_check_postgres_permissions

PostgreSQL should have the owner and group as postgres. If this is not the case, run the following command

chown -R postgres:postgres postgresql

Start the database server

sudo service postgresql start
02_start_postgres

Check Postgres’ status

pg_lsclusters
03_check_postgres_cluster

or with the following command

service postgresql status
04_check_postgres_status

Log in to psql

psql is a command-line interface (CLI) tool for interacting with PostgreSQL.

The installation of PostgreSQL on Linux results in the creation of the following:

  • a new Linux user called postgres;
  • a database named postgres owned by user postgres.
05_linux_users 06_postgres_databases

To list the Linux users, run either command:

compgen -u
 cut -d: -f1 /etc/passwd

At this point we haven’t created our PostgreSQL user yet nor a new database, therefore we can only log in to the postgres database as the postgres user.

Run the following command to log in to Postgres with psql.

The -U flag means user, the -d flag means database:

psql -U postgres -d postgres

This will result in an error

07_postgres_peer_authentication_failed

The reason is peer authentication

cd /etc/postgresql/16/main/
sudo nano pg_hba.conf
08_postgres_peer_authentication_method

Peer authentication means that the connection is only allowed if the name of the PostgreSQL user is the same as the name of the operating system (Ubuntu) user.

So, if you run psql -U postgres as Ubuntu user root or your_username (e.g. md in my case), it won't work.

The solution is to specify a mapping between operating system users and database users in a Postgres file called pg_ident.conf

cd /etc/postgresql/16/main/
sudo nano pg_ident.conf
09_postgres_name_mapping

Save and close the file with Ctrl+x, then type in y and then hit the Enter key.

We need to reference this mapping in another file called pg_hba.conf

sudo nano pg_hba.conf
10_postgres_pg_hba_conf

Save and close the file.

Restart the service so that the changes take effect:

sudo service postgresql restart

If you try to log in, now it should work. Either command works:

psql -U postgres -d postgres
psql -U postgres
11_log_in_with_psql

Once you’re logged in, run the following command to check username you’re logged in as, and database you’re logged in to

\conninfo

To exit psql:

\q

You may have noticed that I added “postgres” to the pg_ident.conf file. This ensures that I can switch between the “mapping” option and the alternative methods that we’re about to learn. If I don’t add “postgres”, the alternative methods won’t work without disabling the mapping first.

If you do not want to change the pg_ files, you can alternatively log in to psql by switching to the postgres user with the sudo command which elevates privileges to execute commands as root.

The postgres Linux user was automatically created when PostgreSQL was installed.

The command below switches the current user to the postgres user and opens a new shell session.

sudo su postgres
psql

If you don’t specify a database with the -d flag, the psql command logs you in a database that has the same name as the user, in this case the postgres database (because you are logging in as the postgres user).

\conninfo
\q
exit
12_sudo_su_postgres

Another alternative is:

sudo -u postgres psql

The command above runs the psql command directly as the postgres user without opening a new shell session.

\conninfo
\q
13_sudo_u_postgres_psql

Create a new PostgreSQL user

Log in to psql

sudo -u postgres psql
CREATE ROLE your_username LOGIN SUPERUSER INHERIT CREATEDB CREATEROLE;

Here's a breakdown of what each part of the command does:

  • LOGIN: The role can be used to log into the PostgreSQL server.
  • SUPERUSER: The role has the highest level of privileges and can perform any action without restriction.
  • INHERIT: The role will inherit privileges from any roles it is a member of, simplifying privilege management.
  • CREATEDB: The role can create new databases, which is useful for administrative tasks.
  • CREATEROLE: The role can create, modify, and delete other roles, allowing it to manage user access and permissions.

ALTER ROLE your_username SET client_encoding TO 'utf8';

This command sets the default client encoding for a specific role. This ensures that whenever the specified role connects to the PostgreSQL server, the client encoding is automatically set to UTF-8, which is a widely used character encoding for Unicode.

All text data sent to and received from the PostgreSQL server by clients using this role will be encoded in UTF-8. This helps maintain consistency in how text data is handled, especially in applications that use multiple languages or special characters.

Setting a default client encoding can help avoid issues where the client and server might use different encodings, which could lead to data corruption or incorrect display of text.

ALTER ROLE your_username SET default_transaction_isolation TO 'read committed';

The command above sets the default transaction isolation level to 'read committed' for the specified role, ensuring that all transactions initiated by this role adhere to this isolation level unless overridden.

Transaction Isolation Levels in PostgreSQL

  • Read Committed: This is the default isolation level in PostgreSQL. In this mode, a query in a transaction sees only data that was committed before the query began. It does not see changes made by concurrent transactions that have not yet been committed.
  • Repeatable Read: This isolation level ensures that if a query is executed again within the same transaction, it will see the same rows, even if other transactions are concurrently modifying the data.
  • Serializable: This is the strictest isolation level, ensuring complete isolation from concurrent transactions. It guarantees that the transaction operates as if it were the only transaction in the system.
  • Read Uncommitted: This isolation level allows a transaction to see uncommitted changes made by other transactions. PostgreSQL does not actually support this level; it maps it to Read Committed.

ALTER ROLE your_username WITH PASSWORD 'your_password';
\q

I’m going to log back in to psql as my new user “md”

psql

This command will result in an error because there is no database named “md”. We need to specify the database name. Let’s log in to the only database available at this point, that is “postgres”

14_psql_error
psql -d postgres
15_postgres_log_in_with_new_username

To delete a user (or role):

DROP ROLE md;

Create a new PostgreSQL database

I’m logging into the postgres database as user md and going to create a new database named my_db

psql -d postgres
CREATE DATABASE my_db;

With the /l command you can see the list of all databases

16_list_of_databases

Switch database:

\connect my_db;
17_switch_database

Exit psql

\q

To delete a database:

DROP DATABASE my_db;

Install psycopg2

So far, we have interacted with PostgreSQL via the terminal using psql.

To allow our Python/Django application to interact with our PostgreSQL database, we need to install psycopg2.

Psycopg2 is a PostgreSQL database adapter for the Python programming language. It is a powerful and flexible library for interacting with PostgreSQL databases from Python applications.

Before installing psycopg2, we need to install in Ubuntu two development packages: python3-dev and libpq-dev.

These two packages ensure that your system has all the necessary tools and libraries to compile and install psycopg2 and other similar Python packages that rely on C extensions and PostgreSQL libraries.

sudo apt-get update && sudo apt-get upgrade
sudo apt install python3-dev libpq-dev
pip3 install psycopg2

If you want to install a specific version of psycopg2:

pip3 install psycopg2==version_number