/munetakaJupyterHub/Computer/PostgreSQL/PostgreSQL-first-experience/.ipynb

CREATE TABLE courses( -- PRIMARY KEY constraint, which specifies that the values in that column must be unique and cannot be null c_no text PRIMARY KEY, title text, hours integer )

INSERT INTO courses(c_no, title, hours) VALUES ('CS301', 'Databases', 30), ('CS305', 'Networks', 60);

CREATE TABLE students( -- PRIMARY KEY constraint, which specifies that the values in that column must be unique and cannot be null s_id integer PRIMARY KEY, name text, start_year integer );

INSERT INTO students(s_id, name, start_year) VALUES (1451, 'Anna', 2014), (1432, 'Victor', 2014), (1556, 'Nina', 2015);

-- INSERT 0 3 means that there were no errors during the INSERT statement and that three rows were successfully inserted into the students table.

CREATE TABLE exams( s_id integer REFERENCES students(s_id), c_no text REFERENCES courses(c_no), score integer, CONSTRAINT pk PRIMARY KEY(s_id, c_no) );

This is a CREATE TABLE statement in PostgreSQL that is used to create a new table named exams. The exams table has three columns:

s_id is an integer type column that references the s_id column in the students table.
c_no is a text type column that references the c_no column in the courses table.
score is an integer type column that stores the score of the student for a particular course.

In addition to the three columns, there is a CONSTRAINT named pk that defines a composite primary key on the s_id and c_no columns. This means that the combination of s_id and c_no values must be unique in the exams table. The REFERENCES clauses in the s_id and c_no columns specify that these columns reference the s_id and c_no columns in the students and courses tables respectively, and therefore they act as foreign keys.

The REFERENCES clause ensures that the values in the s_id and c_no columns in the exams table must match the values in the s_id and c_no columns in the students and courses tables respectively. This helps to maintain referential integrity between the exams, students, and courses tables.

Overall, this CREATE TABLE statement defines a table schema that can be used to store information about the exams taken by students for various courses. The composite primary key constraint ensures that there can be no duplicate entries for the same student and course combination.

Summary;

s_id integer REFERENCES students(s_id), requires that s_id exists in students table
c_no text REFERENCES courses(c_no), requires that c_no exits in courses table,
while CONSTRAINT pk PRIMARY KEY(s_id, c_no), requires that the combination of s_id and c_no values must be unique in the exams table

Yes, that's a good summary! To be more precise, the s_id and c_no columns in the exams table have a referential integrity constraint with the students and courses tables respectively. This constraint ensures that the values in the s_id column of the exams table must exist in the s_id column of the students table, and the values in the c_no column of the exams table must exist in the c_no column of the courses table.

Additionally, the pk constraint specifies that the combination of values in the s_id and c_no columns of the exams table must be unique, which means that a student can take a course only once and cannot have duplicate scores for the same course.

INSERT INTO exams(s_id, c_no, score) VALUES (1451, 'CS301', 5), (1556, 'CS301', 5), (1451, 'CS305', 5), (1432, 'CS305', 4);

¶Data Retrieva

test=> SELECT title AS course_title, hours FROM courses; course_title | hours --------------+------- Databases | 30 Networks | 60 (2 rows)

test=> SELECT start_year FROM students; start_year

   2014
   2014
   2015

(3 rows)

test=> SELECT start_year FROM students;test=> SELECT students.name, exams.score FROM students LEFT JOIN exams ON students.s_id = exams.s_id AND exams.c_no = 'CS305'; name | score --------+------- Anna | 5 Victor | 4 Nina |
(3 rows)

start_year

   2014
   2014
   2015

(3 rows)

test=> SELECT DISTINCT start_year FROM students; start_year

   2014
   2015

(2 rows)

test=> SELECT 2+2 AS result; result

(1 row)

test=> SELECT * FROM courses WHERE hours > 45; c_no | title | hours -------+----------+------- CS305 | Networks | 60 (1 row)

The condition must be of a logical type. For example, it can contain relations =, <> (or !=), >, >=, <, <=, as well as combine simple conditions using logical operations AND, OR, NOT, and parenthesis (like in regular programming languages).

andling NULLs is a bit more subtle. The resulting table can contain only those rows for which the filtering condition is true; if the condition is false or undefined, the row is ex- cluded. Note the following: • The result of comparing something to NULL is undefined. • The result of logical operations on NULL is usually unde- fined (exceptions: true OR NULL = true, false AND NULL = false). • The following special conditions are used to check whether the value is undefined: IS NULL (IS NOT NULL) and IS DISTINCT FROM (IS NOT DISTINCT FROM). It may also be convenient to use the coalesce function. You can find more details in documentation: postgrespro. com/doc/functions-comparison.html

¶Joins

A well-designed database should not contain redundant data. For example, the exams table must not contain student names, as this information can be found in another table by the number of the student ID card.

test=> SELECT * FROM exams; s_id | c_no | score ------+-------+------- 1451 | CS301 | 5 1556 | CS301 | 5 1451 | CS305 | 5 1432 | CS305 | 4 (4 rows)

Cartesian product or Cartesian product

test=> SELECT * FROM courses, exams; c_no | title | hours | s_id | c_no | score -------+-----------+-------+------+-------+------- CS301 | Databases | 30 | 1451 | CS301 | 5 CS305 | Networks | 60 | 1451 | CS301 | 5 CS301 | Databases | 30 | 1556 | CS301 | 5 CS305 | Networks | 60 | 1556 | CS301 | 5 CS301 | Databases | 30 | 1451 | CS305 | 5 CS305 | Networks | 60 | 1451 | CS305 | 5 CS301 | Databases | 30 | 1432 | CS305 | 4 CS305 | Networks | 60 | 1432 | CS305 | 4 (8 rows)

This result is called the direct or Cartesian product of tables: each row of one table is appended to each row of the other table.

Explain Cartesian product of tables?

In database theory, the Cartesian product of two tables is a result set that contains all possible combinations of rows from both tables. It is also known as a cross join, and it is represented by the × symbol in SQL.

When two tables are cross-joined, every row of the first table is matched with every row of the second table, resulting in a new table with the number of rows equal to the product of the number of rows in each table. For example, if Table A has m rows and Table B has n rows, then the Cartesian product of A and B will have m * n rows.

Here is an example of how to compute the Cartesian product of two tables using SQL:

SELECT *
FROM TableA
CROSS JOIN TableB;

This will return a result set that contains all possible combinations of rows from TableA and TableB. The resulting table will have columns from both tables and the number of rows equal to the product of the number of rows in each table.

It is important to note that a Cartesian product can be computationally expensive, especially for large tables. Therefore, it is generally not recommended to use a cross join unless it is necessary for a specific data analysis task.

As a rule, you can get a more useful and informative result if you specify the join condition in the WHERE clause. Let’s get all scores for all courses, matching courses to exams in this course:

test=> SELECT courses.title, exams.s_id, exams.score FROM courses, exams WHERE courses.c_no = exams.c_no; title | s_id | score -----------+------+------- Databases | 1451 | 5 Databases | 1556 | 5 Networks | 1451 | 5 Networks | 1432 | 4 (4 rows)

Another way to join tables is to explicitly use the JOIN key-word. Let’s display all students and their scores for the “Net-works” course

test=> SELECT students.name, exams.score FROM students JOIN exams ON students.s_id = exams.s_id AND exams.c_no = 'CS305'; name | score --------+------- Anna | 5 Victor | 4 (2 rows)

in this example, the result does not include the rows of the original table that do not have a pair in the other table: al-though the condition is applied to the subjects, the students that did not take an exam in this subject are also excluded. To include all the students into the result, use the outer join:

A LEFT JOIN is a type of join operation in SQL that returns all the rows from the left table and matching rows from the right table, and if there is no match in the right table, it returns NULL for all the right table's columns.

SELECT * FROM TableA LEFT JOIN TableB
ON TableA.Key = TableB.Key;

In this example, the TableA is the left table, and TableB is the right table. A LEFT JOIN is a type of join operation in SQL that returns all the rows from the left table and matching rows from the right table, and if there is no match in the right table, it returns NULL for all the right table's columns.

test=> select * FROM exams; s_id | c_no | score ------+-------+------- 1451 | CS301 | 5 1556 | CS301 | 5 1451 | CS305 | 5 1432 | CS305 | 4 (4 rows)

test=> SELECT students.name, exams.score FROM students LEFT JOIN exams ON students.s_id = exams.s_id AND exams.c_no = 'CS305'; name | score --------+------- Anna | 5 Victor | 4 Nina |
(3 rows)

test=> SELECT students.name, exams.score FROM students LEFT JOIN exams ON students.s_id = exams.s_id WHERE exams.c_no = 'CS305'; name | score --------+------- Anna | 5 Victor | 4 (2 rows)

RIGHT JOIN in SQL, you can simply reverse the order of the tables in a LEFT JOIN query, like this:

SELECT * FROM TableA RIGHT JOIN TableB
ON TableA.Key = TableB.Key;

In this example, TableA is the right table, and TableB is the left table. The RIGHT JOIN operation returns all the rows from TableA and matching rows from TableB, and if there is no match in TableB, it returns NULL for all the left table's columns.

SELECT *
FROM TableB
RIGHT JOIN TableA
ON TableB.Key = TableA.Key;

In this example, TableB is the right table, and TableA is the left table. The RIGHT JOIN operation returns all the rows from TableB and matching rows from TableA, and if there is no match in TableA, it returns NULL for all the left table's columns.

test=> select * FROM students;
 s_id |  name  | start_year 
------+--------+------------
 1451 | Anna   |       2014
 1432 | Victor |       2014
 1556 | Nina   |       2015
(3 rows)

test=> select * FROM exams;
 s_id | c_no  | score 
------+-------+-------
 1451 | CS301 |     5
 1556 | CS301 |     5
 1451 | CS305 |     5
 1432 | CS305 |     4
(4 rows)

test=> SELECT students.name, exams.score
FROM students RIGHT JOIN exams
ON students.s_id = exams.s_id AND exams.c_no = 'CS305';
  name  | score 
--------+-------
        |     5
        |     5
 Anna   |     5
 Victor |     4
(4 rows)

&pard;Subqueries

The SELECT operation returns a table, which can be displayed as the query result (as we have already seen) or used in another SQL query. Such a nested SELECT command in paren-theses is called a subquery.

test=> SELECT name,
(SELECT score FROM exams WHERE exams.s_id = students.s_id AND exams.c_no = 'CS305')
FROM students;
  name  | score 
--------+-------
 Anna   |     5
 Victor |     4
 Nina   |      
(3 rows)

test=> SELECT * FROM students;
 s_id |  name  | start_year 
------+--------+------------
 1451 | Anna   |       2014
 1432 | Victor |       2014
 1556 | Nina   |       2015
(3 rows)

test=> SELECT * FROM exams;
 s_id | c_no  | score 
------+-------+-------
 1451 | CS301 |     5
 1556 | CS301 |     5
 1451 | CS305 |     5
 1432 | CS305 |     4
(4 rows)

test=> SELECT name, 
(SELECT score FROM exams WHERE exams.s_id = students.s_id AND exams.c_no = 'CS305')
FROM students;
  name  | score 
--------+-------
 Anna   |     5
 Victor |     4
 Nina   |      
(3 rows)

test=> SELECT * FROM exams 
WHERE ( SELECT start_year FROM students WHERE students.s_id = exams.s_id ) > 2014;
 s_id | c_no  | score 
------+-------+-------
 1556 | CS301 |     5
(1 row)

test=> SELECT name, start_year FROM students
WHERE s_id IN (SELECT s_id FROM exams WHERE c_no = 'CS305');

  name  | start_year 
--------+------------
 Anna   |       2014
 Victor |       2014
(2 rows)

test=> SELECT name, start_year FROM students
WHERE s_id NOT IN (SELECT s_id FROM exams WHERE score < 5);

 name | start_year 
------+------------
 Anna |       2014
 Nina |       2015
(2 rows)

test=> SELECT name, start_year FROM students
WHERE NOT EXISTS (SELECT s_id FROM exams WHERE exams.s_id = students.s_id AND score < 5);
 name | start_year 
------+------------
 Anna |       2014
 Nina |       2015
(2 rows)

test=> SELECT * FROM courses;
 c_no  |   title   | hours 
-------+-----------+-------
 CS301 | Databases |    30
 CS305 | Networks  |    60
(2 rows)

test=> SELECT * FROM exams;
 s_id | c_no  | score 
------+-------+-------
 1451 | CS301 |     5
 1556 | CS301 |     5
 1451 | CS305 |     5
 1432 | CS305 |     4
(4 rows)

test=> SELECT exams.* FROM courses, exams WHERE courses.c_no = exams.c_no AND courses.title = 'Databases';
 s_id | c_no  | score 
------+-------+-------
 1451 | CS301 |     5
 1556 | CS301 |     5
(2 rows)

test=> SELECT s.name, ce.score FROM students s
JOIN (SELECT exams.* FROM courses, exams WHERE courses.c_no = exams.c_no AND courses.title = 'Databases') ce
ON s.s_id = ce.s_id;
 name | score 
------+-------
 Anna |     5
 Nina |     5
(2 rows)

est=> SELECT s.name, e.score FROM students s, courses c, exams e
WHERE c.c_no = e.c_no AND c.title = 'Databases' AND s.s_id = e.s_id ;
 name | score 
------+-------
 Anna |     5
 Nina |     5
(2 rows)

¶Sorting

test=> SELECT * FROM exams ORDER BY score, s_id, c_no DESC;
 s_id | c_no  | score 
------+-------+-------
 1432 | CS305 |     4
 1451 | CS305 |     5
 1451 | CS301 |     5
 1556 | CS301 |     5
(4 rows)

¶Grouping Operations

test=> SELECT count(*), count(DISTINCT s_id), avg(score) FROM exams;
 count | count |        avg         
-------+-------+--------------------
     4 |     3 | 4.7500000000000000
(1 row)

test=> SELECT students.name FROM students, exams WHERE students.s_id = exams.s_id AND exams.score = 5
GROUP BY students.name HAVING count(*) > 1;
 name 
------
 Anna
(1 row)

¶Changing and Deleting Data

test=> select * from courses;
 c_no  |   title   | hours 
-------+-----------+-------
 CS301 | Databases |    30
 CS305 | Networks  |    60
(2 rows)

test=> UPDATE courses SET hours = hours * 2 WHERE c_no = 'CS301';
UPDATE 1

test=> select * from courses ;
 c_no  |   title   | hours 
-------+-----------+-------
 CS305 | Networks  |    60
 CS301 | Databases |    60
(2 rows)

test=> SELECT * FROM exams ;
 s_id | c_no  | score 
------+-------+-------
 1451 | CS301 |     5
 1556 | CS301 |     5
 1451 | CS305 |     5
 1432 | CS305 |     4
(4 rows)

test=> DELETE FROM exams WHERE score < 5;
DELETE 1

test=> SELECT * FROM exams ;
 s_id | c_no  | score 
------+-------+-------
 1451 | CS301 |     5
 1556 | CS301 |     5
 1451 | CS305 |     5
(3 rows)

¶Transactions

In PostgreSQL, a transaction is a group of database operations that are executed as a single unit of work. Transactions are used to ensure that all operations are completed successfully or none of them are completed at all. This is known as the ACID properties of a transaction:

Atomicity: All operations within a transaction are treated as a single unit of work, which either completes successfully or is rolled back if an error occurs. If any part of a transaction fails, the entire transaction is rolled back and the database is returned to its previous state.
Consistency: The database remains in a consistent state before and after a transaction. This means that all constraints, such as primary key and foreign key constraints, are satisfied.
Isolation: Each transaction is executed in isolation from other transactions, meaning that changes made by one transaction are invisible to other transactions until the first transaction is committed.
Durability: Once a transaction is committed, its changes are permanent and cannot be undone, even in the event of a system failure.

PostgreSQL provides several commands to control transactions, such as BEGIN, COMMIT, and ROLLBACK. Transactions can also be automatically managed by client libraries or ORMs that implement transaction management on top of the PostgreSQL protocol.

Client libraries and ORMs (Object-Relational Mappers) are software components that provide an abstraction layer over the database and simplify the process of interacting with it. These components can manage transactions on behalf of the application developer, so the developer doesn't have to write explicit transaction management code.

Some popular client libraries for PostgreSQL include libpq, JDBC, and psycopg2. These libraries provide functions and methods to connect to a PostgreSQL database, execute SQL commands, and manage transactions.

ORMs, on the other hand, provide an object-oriented interface for working with a database. Instead of writing SQL queries, the developer writes code that interacts with objects that represent database tables or views. ORMs like SQLAlchemy, Django ORM, and Hibernate can manage transactions transparently for the developer, so they don't have to worry about manually starting or committing transactions.

Using a client library or ORM can save developers time and effort in managing transactions, but it's still important to understand the underlying concepts of transactions and how they work.

Rails Active Record is also an ORM (Object-Relational Mapper) that provides an interface between a Rails application and a relational database. Active Record maps database tables to Ruby classes and provides methods for querying and manipulating data in those tables.

Active Record is a key component of the Rails framework, and it implements many of the standard features of an ORM, including:

Mapping database tables to Ruby classes
Generating SQL queries based on Ruby method calls
Defining associations between database tables
Validating data before it is saved to the database
Handling transactions to ensure data consistency

Active Record makes it easy for Rails developers to work with relational databases without having to write low-level SQL queries or worry about the details of database management. Instead, they can focus on writing Ruby code that interacts with their data models, which makes it easier to write and maintain their applications.

test=> \d
          List of relations
 Schema |   Name   | Type  |  Owner  
--------+----------+-------+---------
 public | courses  | table | alexlai
 public | exams    | table | alexlai
 public | students | table | alexlai
(3 rows)

test=> CREATE TABLE groups( g_no text PRIMARY KEY,
monitor integer NOT NULL REFERENCES students(s_id));
CREATE TABLE

test=> \d
          List of relations
 Schema |   Name   | Type  |  Owner  
--------+----------+-------+---------
 public | courses  | table | alexlai
 public | exams    | table | alexlai
 public | groups   | table | alexlai
 public | students | table | alexlai
(4 rows)

test=> \d+ groups
                                          Table "public.groups"
 Column  |  Type   | Collation | Nullable | Default | Storage  | Compression | Stats target | Description 
---------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
 g_no    | text    |           | not null |         | extended |             |              | 
 monitor | integer |           | not null |         | plain    |             |              | 
Indexes:
    "groups_pkey" PRIMARY KEY, btree (g_no)
Foreign-key constraints:
    "groups_monitor_fkey" FOREIGN KEY (monitor) REFERENCES students(s_id)
Access method: heap

¶ Heap- default storage method for tables

In PostgreSQL, the "heap" access method refers to the default storage method for tables. When you create a table without specifying a storage method, it is created as a heap-organized table by default.

A heap-organized table stores its data in an unordered collection of pages called "heap pages". When a new row is inserted, it is placed at the end of the last page in the table. If the page is full, a new page is added to the end of the table, and the row is inserted there. This process continues as rows are added to the table, resulting in a "heap" of pages containing the table's data.

When you query a heap-organized table, PostgreSQL uses a sequential scan to read through all of the pages in the table, looking for the rows that match the query criteria. This can be slow for large tables or for queries that only need to access a small portion of the data.

To speed up queries on large tables, you can create indexes on the columns that are frequently queried. An index provides a faster way to look up rows based on the values in the indexed columns.

Other access methods in PostgreSQL, such as btree, hash, and gist, provide alternative ways of storing and accessing data that may be more efficient for certain types of queries or data.

we can add a new column into the already existing table:

test=> \d+ students
                                           Table "public.students"
   Column   |  Type   | Collation | Nullable | Default | Storage  | Compression | Stats target | Description 
------------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
 s_id       | integer |           | not null |         | plain    |             |              | 
 name       | text    |           |          |         | extended |             |              | 
 start_year | integer |           |          |         | plain    |             |              | 
Indexes:
    "students_pkey" PRIMARY KEY, btree (s_id)
Referenced by:
    TABLE "exams" CONSTRAINT "exams_s_id_fkey" FOREIGN KEY (s_id) REFERENCES students(s_id)
    TABLE "groups" CONSTRAINT "groups_monitor_fkey" FOREIGN KEY (monitor) REFERENCES students(s_id)
Access method: heap

test=> ALTER TABLE students
ADD g_no text REFERENCES groups(g_no);
ALTER TABLE
test=> \d+ students
                                           Table "public.students"
   Column   |  Type   | Collation | Nullable | Default | Storage  | Compression | Stats target | Description 
------------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
 s_id       | integer |           | not null |         | plain    |             |              | 
 name       | text    |           |          |         | extended |             |              | 
 start_year | integer |           |          |         | plain    |             |              | 
 g_no       | text    |           |          |         | extended |             |              | 
Indexes:
    "students_pkey" PRIMARY KEY, btree (s_id)
Foreign-key constraints:
    "students_g_no_fkey" FOREIGN KEY (g_no) REFERENCES groups(g_no)
Referenced by:
    TABLE "exams" CONSTRAINT "exams_s_id_fkey" FOREIGN KEY (s_id) REFERENCES students(s_id)
    TABLE "groups" CONSTRAINT "groups_monitor_fkey" FOREIGN KEY (monitor) REFERENCES students(s_id)
Access method: heap

¶ Transaction

In PostgreSQL, a transaction is a group of SQL statements that are executed as a single unit of work. The main idea behind transactions is to ensure that a set of related changes to the database are treated as a single, atomic operation, meaning that either all of the changes are applied or none of them are.

BEGIN; <-- start with a TRANSACTION

UPDATE customers SET balance = balance - 100 WHERE id = 1;
INSERT INTO transactions (customer_id, amount) VALUES (1, -100);

COMMIT;

For example, from one terminal-a

test=> BEGIN;
BEGIN
test=*> INSERT INTO groups(g_no, monitor)
SELECT 'A-101', s_id
FROM students
WHERE name = 'Anna';
INSERT 0 1
test=*>

And from another temial-b

$ psql -U postgres -d test
psql (15.1)
Type "help" for help.

test=# SELECT * FROM groups;
 g_no | monitor 
------+---------
(0 rows)

test=# select * from students;
 s_id |  name  | start_year | g_no 
------+--------+------------+------
 1451 | Anna   |       2014 | 
 1432 | Victor |       2014 | 
 1556 | Nina   |       2015 | 
(3 rows)

test=# UPDATE students SET g_no = 'A-101';
ERROR:  insert or update on table "students" violates foreign key constraint "students_g_no_fkey"
DETAIL:  Key (g_no)=(A-101) is not present in table "groups".
test=# \d+ students
                                           Table "public.students"
   Column   |  Type   | Collation | Nullable | Default | Storage  | Compression | Stats target | Description 
------------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
 s_id       | integer |           | not null |         | plain    |             |              | 
 name       | text    |           |          |         | extended |             |              | 
 start_year | integer |           |          |         | plain    |             |              | 
 g_no       | text    |           |          |         | extended |             |              | 
Indexes:
    "students_pkey" PRIMARY KEY, btree (s_id)
Foreign-key constraints:
    "students_g_no_fkey" FOREIGN KEY (g_no) REFERENCES groups(g_no)
Referenced by:
    TABLE "exams" CONSTRAINT "exams_s_id_fkey" FOREIGN KEY (s_id) REFERENCES students(s_id)
    TABLE "groups" CONSTRAINT "groups_monitor_fkey" FOREIGN KEY (monitor) REFERENCES students(s_id)
Access method: heap

Go back to terminal-a,

test=*> COMMIT;
COMMIT
test=>

Go back to terminal-b,

test=# UPDATE students SET g_no = 'A-101';
UPDATE 3
test=# SELECT * from students;
 s_id |  name  | start_year | g_no  
------+--------+------------+-------
 1451 | Anna   |       2014 | A-101
 1432 | Victor |       2014 | A-101
 1556 | Nina   |       2015 | A-101
(3 rows)

test=#

¶ Scehma

In PostgreSQL, a schema is a logical container that groups related database objects together. Database objects include tables, views, indexes, sequences, functions, operators, and data types.

When you create a new table like the example you provided, it is created within a specific schema, which can be specified using the CREATE TABLE command. If you don't specify a schema name, the table is created in the public schema by default.

For example, you can create the courses table in a schema named my_schema like this:

CREATE SCHEMA my_schema;

CREATE TABLE my_schema.courses (
    c_no text PRIMARY KEY,
    title text,
    hours integer
);

In this example, the courses table is created within the my_schema schema. You can then query the table using the fully-qualified name my_schema.courses.

So a schema is a higher-level container that holds database objects like tables. The statements to create individual tables are just part of the objects that can be included in a schema.