How Do Databases Work? #
Databases work using a Database Management System (DBMS), which provides tools for storing, retrieving, and manipulating data.
Tables #
A table is a data structure used to store information in a structured format (rows and columns).
Each row represents a data entity, and each column defines a specific information field. The data contained in these tables can be consulted, modified, updated, and organized via the SQL language.
Each column has a name, a data type, and a set of constraints that define the allowed values.
Keys #
Keys are one of the most important elements of a relational database to maintain the relationship between tables and they also allow you to uniquely identify data in a table. The primary key is a key that allows you to uniquely identify the tuple of the database. In contrast, the foreign key is a key used to determine the relationship between tables through the primary key of a table, that is, the primary key of one table acts as a foreign key for another table.
What is a Primary Key? #
The primary key is a key that allows you to uniquely identify the tuple of the database. A primary key is used to ensure that the data in a specific column is unique. A column cannot have NULL values. It is either an existing table column or a column specifically generated by the database according to a defined sequence.
The primary key of a table is optional, but it is not recommended to omit it.
What is a Foreign Key? #
A foreign key is a column or group of columns in a relational database table that provides a link between data in two tables. It is a column (or multiple columns) that refers to a column (most often the primary key) of another table.
Schemas #
A schema is a logical structure that defines how data is organized (tables, relationships, constraints, indexes, etc.).
A database can contain multiple schemas.
CREATE SCHEMA company;
CREATE TABLE company.employees (
id INT PRIMARY KEY,
name VARCHAR(100),
position VARCHAR(50)
);
Cardinalities #
Cardinalities (or multiplicities) indicate the number of instances of one class that can be associated with an instance of another class in a relationship (association).
Here is a list of the most common cardinalities and their meaning:
| UML Notation | Meaning | Description |
|---|---|---|
| 0..1 | Zero or one | An instance can be associated with none or exactly one other instance. |
| 1 | Exactly one | There is always exactly one associated instance. |
| 0..* or * | Zero or more | There can be none, one, or multiple associated instances. |
| 1..* | One or more | There is at least one associated instance. |
| n | Exactly n | There are exactly n associated instances. |
| 0..n | From zero to n | Between no and n associated instances. |
| m..n | From m to n | At minimum m and at most n associated instances. |
Concrete examples:
-
A person owns 0..* cars → a person can have no or multiple cars.
-
A car belongs to 1 person → each car has exactly one owner.
-
A course is followed by 1..* students → at least one student per course.
Constraints #
A constraint is a rule that defines the values and data types allowed for a table or column. Constraints are used to guarantee the integrity and consistency of data in a database. If there is a violation between the constraint and the action on the data, the action is interrupted.
The following constraints are commonly used in SQL:
- NOT NULL - Ensures that a column cannot have a NULL value
- UNIQUE - Ensures that all values in a column are different.
- PRIMARY KEY - A combination of NOT NULL and UNIQUE. Uniquely identifies each row in a table.
- FOREIGN KEY - Prevents actions that would destroy links between tables.
- CHECK - Ensures that the values in a column satisfy a specific condition
- DEFAULT - Sets a default value for a column if no value is specified.
- CREATE INDEX - Used to create and retrieve data very quickly from the database.
Index #
An index is a data structure that improves the speed of data retrieval operations by providing fast access paths.
An index can be created on one or multiple columns of a table.
CREATE INDEX idx_employees_name ON company.employees (name);
Creating an index occupies memory space in the database and, as it is updated each time the attached table is modified, can increase the processing time of the RDBMS when entering data. Therefore, the creation of an index must be justified and the columns to which it relates must be carefully chosen (in order to minimize duplication). Thus, some DBMSs automatically create an index when a primary key is defined.
Views #
A view is a virtual table based on the result of a query. They allow you to simplify complex queries and provide a simplified view of the data.
Views act as a facade: A view can be considered as a facade to hide the complexity of a data model.
-
Basic syntax: Use CREATE VIEW to define a view from a complex query, often created from multiple joins
-
Common rules: Use the widest queries possible (avoid WHERE restrictions in views unless necessary); use column aliases to facilitate subsequent queries and avoid naming conflicts.
-
Views are dynamic: When using a view, the original query is “replayed”, the data collected is therefore what exists in the original tables.
-
Updates are prohibited: views do not allow the addition / modification / deletion of data.
CREATE VIEW v_employees AS
SELECT e.id, e.name, e.position
FROM company.employees e;
A view is used like a table in a SELECT:
SELECT firstname, lastname FROM employee;
You can use the view as a complete set and therefore include it in a join, a union, an intersection.
A view can be deleted:
DROP VIEW employee;
Deleting the view will not delete the data from the tables used to create the view.
A view can be modified:
ALTER VIEW employee AS (new_query);
Transactions #
Guaranteeing data integrity through ACID properties (atomicity, consistency, isolation, durability).
Atomicity #
Atomicity is the all-or-nothing principle. When a transaction is initiated, it must either complete fully or not complete at all. If any part of the transaction fails, the entire transaction fails, the entire transaction is canceled, leaving the database unchanged. For example, if you transfer money between bank accounts, the debit and credit operations must succeed together. If one part fails, neither operation is applied, which guarantees that the database remains in its original state.
Consistency #
Consistency ensures that the database remains in a valid state before and after the transaction. This means that all data integrity constraints are maintained throughout the transaction. If a transaction violates an integrity rule, it will not be committed. For example, if you have a rule that prevents negative balances on bank accounts, any transaction attempting to overdraft an account will be reversed. This ensures that all transactions lead the database from one consistent state to another, thus preserving data accuracy.
Isolation #
Isolation means that concurrent transactions do not interfere with each other. Each transaction operates independently and intermediate states are not visible to other transactions. This helps avoid problems such as “dirty reads”, where a transaction reads uncommitted modifications of another transaction. For example, if two users simultaneously update the balance of the same account, isolation ensures that each transaction is processed in such a way that it does not see the uncommitted modifications of the other. This helps maintain data consistency and avoid anomalies.
Durability #
Durability ensures that once a transaction is committed, its effects are permanent, even in the event of system failure. This means that the changes made by the transaction are recorded in non-volatile memory. For example, if you update a customer’s address in a database and the transaction is committed, this change will persist even if the system crashes immediately after. Durability ensures that committed transactions are not lost and that the database can return to a consistent state after a crash.
How Many Types? #
Databases can be classified into four categories:
Relational Databases (RDBMS) #
Use tables with predefined schemas and relationships (for example, MySQL, PostgreSQL, SQL Server).
NoSQL Databases #
Databases that handle unstructured or semi-structured data with flexible schemas (for example, MongoDB, Cassandra).
In-Memory Databases #
Databases that store data in RAM for fast access (for example, Redis). The main objective is to exploit a limited set of data with very low latency.
Graph Databases #
Database that uses nodes and edges to store relationships (for example, Neo4j).
How Do Databases Interact? #
Databases interact with applications and users through
SQL Queries #
The SQL (Structured Query Language) language is commonly used to retrieve or modify data. CRUD operations (Create, Read, Update, Delete) are commonly used to interact with databases. These operations are used to manage data in a database.
Data Types #
| Type | Description |
|---|---|
| CHAR, VARCHAR | Character string |
| DATE, DATETIME, TIME, TIMESTAMP | Dates and times |
| INT, FLOAT | Numbers |
| BLOB | Binary data (images, files, etc.) |
| JSON | Structured data |
| POINT | Geographic coordinates |
Basic SQL Commands #
The structure of an SQL query is composed of several commands.
SELECT projection
FROM sets
WHERE conditions
GROUP BY grouping_criteria
HAVING condition_after_grouping
ORDER BY sort_criteria
LIMIT n;
The projection identifies the information, separated by a comma, that you want to retrieve and list. The projection is located immediately after SELECT and before FROM:
SELECT *
FROM student;
Specific case “”: the character “” allows you to define all the columns of the set to “project”, in this case, in the example, all the columns of the “employees” table, this query is therefore identical to:
SELECT id, lastname, firstname
FROM student;
Column “aliases” can be used for various reasons, the first is to make a result more readable:
SELECT lastname name, firstname first_name
FROM student;
The keyword “AS” can be used, but is no longer required to define an alias, just a space between the projection column and its alias is sufficient.
Aliases are often used for calculated columns in the projection.
SELECT lastname, firstname, (salary * (1 - 40 / 100)) "net salary"
FROM student;
It is recommended to use column aliases when creating views.
It is recommended to use aliases in joins to avoid ambiguities on column names and / or facilitate reading:
SELECT s.name, s.firstname, se.name
FROM student s JOIN service se ON s.service_id = se.id;
The alias “s” identifies the “student” table; the alias “se” identifies the “service” table. We generally use the initials of the tables as aliases. Once the table is aliased, it is mandatory to use the alias to prefix the columns.
WHERE Condition #
Filtering rows is managed by the WHERE clause in a query. WHERE is used to restrict the number of rows returned, based on the “boolean” expression provided.
Operators:
=, >, <, >=, <=, <>, !=, AND, OR, XOR
Clauses:
LIKE, IN, BETWEEN, IS NULL
The LIKE operator allows you to determine the membership of a value to a set of “approximate” values:
WHERE lastname LIKE 'd_boi%'; -- % = any string after, _ = one character
It is necessary to check the behavior of the LIKE operator depending on the DBMS used.
MySQL and MariaDB are case insensitive:
LIKE ‘DuPo%’ will be accepted for all case forms.
PostgreSQL being case sensitive, it is necessary to read the documentation to get the same result.
The IN operator allows you to determine the membership of a value to a set. We often use IN with a subquery:
SELECT e.lastname, e.firstname
FROM employees e
WHERE e.service_id IN (SELECT id FROM services WHERE name
= 'Accounting' OR name = 'Sales');
The BETWEEN operator allows you to determine the membership of a value in a range. BETWEEN is often used with dates and numbers:
SELECT e.lastname, e.firstname
FROM employees e
WHERE e.birthdate BETWEEN '1980-01-01' AND '1990-12-31;
BETWEEN is inclusive, the bounds are included in the result
The NULL value is particular in SQL. It is not strictly a value, so it is necessary to use other operators: IS [NOT] NULL
SELECT e.lastname, e.firstname WHERE salary IS NULL;
SELECT e.lastname, e.firstname WHERE salary IS NOT NULL;
This syntax is widely used in outer joins.
GROUP BY Condition #
The GROUP BY clause allows you to group rows according to one or more criteria. GROUP BY is often used with aggregate functions. GROUP BY always comes after WHERE if applied:
SELECT
s.label,
AVG(e.note) AS Average
FROM subject s
JOIN evaluation e ON s.id = e.subject_id
JOIN student su ON e.student_id = su.id
JOIN promo p ON su.promo_id = p.id
WHERE p.id = 1
GROUP BY s.id;
GROUP BY is used to group rows by subject.
Without GROUP BY, AVG(e.note) would calculate the average of all grades in the promo, all subjects combined.
HAVING Condition #
HAVING allows you to filter data after grouping.
Example:
SELECT
s.label, p.label AS Promo, AVG(e.note) AS average_by_promo
FROM subject s
JOIN evaluation e ON s.id = e.subject_id
JOIN student su ON e.student_id = su.id
JOIN promo p ON su.promo_id = p.id
GROUP BY p.id, s.id
HAVING average_by_promo > 12;
HAVING average_by_promo > 12 only keeps groups with an average greater than 12.
ORDER BY Condition #
ORDER BY allows you to:
- Sort the final result according to one or more columns.
- Sort in ascending order (by default) or descending order (DESC).
Example:
ORDER BY s.lastname, s.birthdate DESC;
The sort clause allows you to sort by multiple columns (present or not in the projection). The first column will be the first order, the following ones will be dependent on the immediately preceding order.
The rows are sorted first by name, then, the first two are sorted by date of birth.
This is optimizable via indexes. You can also define the collation, the way you want to sort. Example: Managing the distinction between e, é and è.
Joins #
Inner Join #
Joins allow you to link multiple data sets according to their primary keys and foreign keys.
The old syntax defines with the Cartesian product and a WHERE condition as follows:
SELECT s.lastname, s.firstname, p.label
FROM student s, promo p
WHERE s.promo_id = p.id; -- Reference the foreign key of student to the primary key of promo
Problem: It will go through hundreds of promos and students and apply the WHERE… First Cartesian product then elimination of lines with WHERE.
The new syntax allows you to define joins with the following syntax:
SELECT s.lastname, s.firstname, p.label
FROM promo p JOIN student s ON p.id = s.promo_id; -- Equivalent INNER JOIN, internal join.
JOIN only returns matches between the two tables.
JOIN corresponds to INNER JOIN by default.
Concrete example:
-- List the grades obtained and the subjects
SELECT
e.note, s.label
FROM
subject s JOIN evaluation e ON s.id = e.subject_id; -- Reference the primary key of subject to the foreign key of evaluation
With only the “SQL” subject:
-- List the grades obtained associated with the "SQL" subject
SELECT
e.note, s.label
FROM
subject s JOIN evaluation e ON s.id = e.subject_id -- Reference the primary key of subject to the foreign key of evaluation
WHERE
s.label = 'SQL'; -- Restrict to the "SQL" subject
It is also possible to create multiple joins:
-- List the grades obtained associated with the "SQL" subject
SELECT
su.lastname, e.note, s.label
FROM
subject s
JOIN evaluation e ON s.id = e.subject_id -- Reference the primary key of subject to the foreign key of evaluation
JOIN student su ON e.student_id = su.id -- Join the previous set (result of a query) to student by referencing the foreign key of evaluation to the primary key of student
WHERE
s.label = 'SQL'; -- Restrict to the "SQL" subject
It is advised to always start with the table / set that has the fewest rows.
Outer Join (LEFT JOIN / RIGHT JOIN) #
Outer joins allow you to retain rows from a set, even when there is no match in the other set.
Unlike inner join (INNER JOIN), which only returns matches, an outer join also displays “orphaned” rows.
LEFT JOIN (left outer join)
The LEFT JOIN retains all rows from the left table, even if no match exists in the right table.
SELECT
e.note, s.label
FROM
subject s
LEFT JOIN evaluation e ON s.id = e.subject_id -- Keep all rows from the subject set (LEFT)
WHERE
e.id IS NULL; -- Only subjects whose grades are NULL
The OUTER keyword is optional, a LEFT or RIGHT join is always by definition an outer join.
RIGHT JOIN (right outer join)
The RIGHT JOIN retains all rows from the right table, even if no match exists in the left table.
SELECT
e.note,
s.label AS Subject
FROM
subject s
RIGHT JOIN evaluation e ON s.id = e.subject_id;
Example:
List all evaluations (grades) even if the subject has been deleted from the subject table:
SELECT
e.id, e.note, s.label AS Subject
FROM
subject s
RIGHT JOIN evaluation e ON s.id = e.subject_id
WHERE
s.id IS NULL;
Union #
Union is the set concept which consists of obtaining all elements that correspond to either set A or set B.
Concretely, the sets used must have the same number of columns, with the same type and in the same order.
SELECT col1, col2 col3 FROM set1
UNION
SELECT col1, col2, col3 FROM set2;
By default, exactly identical rows are not repeated in the result.
Intersect #
Intersection is the set concept which consists of retrieving common rows between multiple sets.
Concretely, the sets used must have the same number of columns, with the same type and in the same order.
SELECT col1, col2 col3 FROM set1
INTERSECT
SELECT col1, col2, col3 FROM set2;
Windowing #
Window functions were introduced in 2003 in the SQL language. Similar to aggregate functions, they perform calculations on the returned set, but unlike aggregation, rows are retained in the result.
The keyword OVER will designate the grouping function as a “windowing” function.
The following example illustrates how to use a function as a windowing function:
SELECT
lastname,
firstname,
salary,
SUM(salary) OVER(ORDER BY lastname) salary_mass
FROM
employees;
You can read this query as: perform the sum of salaries over all employees, as well as names, first names and salaries sorted in order of their surname.
It is possible to use PARTITION BY in the OVER windowing function to perform a grouping on one of the columns.
SELECT
e.lastname,
e.firstname,
sv.name,
e.salary,
SUM(salary) OVER(PARTITION BY sv.id ORDER BY sv.name) s_salary_mass
FROM
service sv JOIN employees e ON sv.id = e.service_id;
The result of the previous query will give a result like the following:
Here, the SUM function with windowing on s_salary_mass.
The ROW_NUMBER() function can be used to display the number of a given row. Used together with a PARTITION BY the numbering will restart at 1 at each break.
SELECT service_id, lastname, ROW_NUMBER() OVER(ORDER BY service_id) num_row
FROM employees;
The RANK() function, unlike ROW_NUMBER() which gives incremental numbers, provides the rank (so possibly the same rank for different rows with the same value).
SELECT service_id, lastname, salary, RANK() OVER(PARTITION BY service_id ORDER BY salary) rank
FROM employees;
The DENSE_RANK() function identical to RANK() provides the rank of a value but unlike RANK() this function will not skip a rank in case of identical values.
The NTILE(slice_nb) function allows you to display the number of the “slice” of grouping in which the row is located.
The LAG(col, distance), LEAD(col, distance) functions allow you to display the column “col” at the distance “distance” either backwards (LAG) or forwards (LEAD), which can be useful for comparing a value to another in a query.
Aggregate Functions #
Aggregate functions allow you to calculate values on columns.
The SQL language provides 5 aggregate functions:
| Function | Description |
|---|---|
| COUNT(*) | Count the rows returned by a SELECT query |
| SUM(expr) | Cumulate the expression (expr) from the returned rows |
| AVG(expr) | Calculate the average of the expr expression from the returned rows |
| MAX(expr) | Determine the largest value expr from the returned rows |
| MIN(expr) | Determine the smallest value expr from the rows |
The so-called aggregate functions return, in a simple SQL query, only one row. The following example is therefore incorrect:
SELECT e.lastname, COUNT(*) FROM employees e;
Indeed, the machine would not know how to return (in a simple way), all the surnames of the employees as well as the number of employees! An error will therefore be raised indicating that you cannot use an aggregate function in this context.
The COUNT(expr) function therefore returns the number of rows affected by an SQL query:
SELECT COUNT(*) nb_employees FROM employees;
This query will therefore return the total number of rows (*) in the employees table.
SELECT COUNT(salary) nb_salaried_employees FROM employees;
This query will return the number of rows in the employees table for which the value of the salary column is NOT NULL.
SELECT COUNT(*) FROM employees WHERE service_id = 1;
The AVG(expr) function returns the average of the defined expression (expr) depending on the affected rows:
SELECT AVG(salary) average_salary FROM employees;
Therefore returns the average salary of the employees table.
SELECT AVG(salary) salary_mass FROM employees WHERE salary >= 2000;
Returns the average salary of employees earning more than 2000 €.
The MAX(expr) function returns the maximum value of the defined expression (expr) depending on the affected rows:
SELECT MAX(salary) average_salary FROM employees;
Returns the highest salary of the employees table.
SELECT MAX(salary) salary_mass FROM employees WHERE salary < 2000;
Returns the maximum salary of employees earning less than 2000 €.
The MIN(expr) function returns the minimum value of the defined expression (expr) depending on the affected rows:
SELECT MIN(salary) average_salary FROM employees;
Returns the lowest salary of the employees table.
SELECT MIN(salary) salary_mass FROM employees WHERE salary > 2000
AND service_id = 2;
Returns the minimum salary of employees earning more than 2000 € in the service whose primary key is 2.
CRUD Operations #
There are four types of CRUD operations.
- Create
- Read
- Update
- Delete
Create #
CREATE is used to add new data to a table.
INSERT INTO company.employees (name, position)
VALUES ('John Doe', 'Manager');
The SQL language allows you to insert multiple rows in a single query:
INSERT INTO employee VALUES (4, 'Talut', 'Jean', 1950), (5, 'Lefort', 'Paul',
2100), (6, 'Dujardin', 'Martine', 2500);
The columns have been omitted because the data respects the number and order of the table columns, as well as any constraints and types defined.
Read #
READ is used to extract data from a table.
SELECT * FROM company.employees;
Update #
UPDATE is used to modify existing data in a table.
UPDATE company.employees
SET name = 'Jane Doe'
WHERE id = 1;
Warning: without a WHERE clause in an update query, all rows in the table are updated.
The use of the WHERE clause is strongly recommended.
It is entirely possible to use calculations in SQL queries. For example, if you want to increase all employees by 3%, you can write the following query:
UPDATE employee SET salary = salary * 1.03;
Delete #
Delete is used to delete data from a table.
DELETE FROM company.employees
WHERE id = 1;
Warning, the DELETE query should always be accompanied by a WHERE clause to avoid deleting all rows.
Special considerations:
A DELETE query may fail if referential integrity constraints are not met.
The CASCADE keyword can be used to delete both rows from a table as well as all rows referenced in other tables.
It is preferable to use a DELETE query within a transaction.
The TRUNCATE keyword can also be used to delete rows from one or more tables.
TRUNCATE [CASCADE] table_name;
TRUNCATE completely empties a table, it is not possible to add a WHERE clause, all rows will be deleted. This is equivalent to:
DELETE [CASCADE] FROM table_name;
TRUNCATE cannot be restored with a ROLLBACK.
APIs #
Applications connect to databases using APIs such as JDBC (Java Database Connectivity) or RESTful services.
Client-Server Architecture #
Databases run on a server and clients (applications) request data over a network.
ORM (Object-Relational Mapping) #
Developers use ORMs (for example, Hibernate, SQLAlchemy) to interact with databases using objects instead of raw SQL language.
Replication and Backup #
Replication consists of copying data from a database server (the primary) to one or more replica servers. If the primary server fails, the replicas can take over (high availability). Read requests can be sent to replicas, while write requests remain on the primary server. It can also be used to maintain up-to-date copies elsewhere (for example, in another region).
Backup consists of creating snapshots or copies of the entire database (data and sometimes schema) to external storage. It can be used to maintain historical data, for data migration, and also if the database is corrupted or lost.
Exercise #
We are asked to:
- List the available training programs in the Neolia catalog
- List the training programs with their theme (i.e. Information Technology, Management)
- List the number of training programs by theme
- List for a given training: the participants and the paying client
- List the revenue realized for all training actions carried out
- Determine the share of revenue paid by the OPCOs
- Determine the average revenue per training action by theme
- Determine the number of days sold
- Determine the profit margin of the business, by globalizing by themes; the margin being calculated according to the following rule: training cost * number_participants - number_days * hourly_rate_trainer - number_days * fixed_location_fee (80€)
- Show the share of public and private financing
The database contains the following information:
Queries:
-- 1. List the available training programs in the Neolia catalog
SELECT
libelle
FROM
formation;
-- 2. List the training programs with their theme (i.e. Information Technology, Management)
SELECT
f.libelle, t.libelle
FROM
formation f
JOIN thematique t ON f.thematique_id = t.id_thematique;
-- 3. List the number of training programs by theme
SELECT
COUNT(f.id_formation) "Number training", t.libelle "Themes"
FROM
formation f
JOIN thematique t ON f.thematique_id = t.id_thematique
GROUP BY
t.id_thematique;
-- 4. List for a given training: the number of participants and the paying client
SELECT
s.id_session,
f.libelle training_name,
c.libelle paying_client,
COUNT(p.id_participant) number_participants
FROM
participant p
JOIN employee e ON p.employee_id = e.id_employee
JOIN client c ON e.client_id = c.id_client
JOIN session_formation s ON p.session_formation_id = s.id_session
JOIN catalogue_formateur cf ON s.catalogue_formateur_id = cf.id_catalogue_formateur
JOIN formation f ON f.id_formation = cf.formation_id
WHERE
f.id_formation = 1
GROUP BY
s.id_session, f.libelle, c.libelle;
-- 5. Determine the revenue realized for all training actions carried out
SELECT
SUM(revenue_total) total_revenue
FROM (
SELECT
f.cost_ht * COUNT(p.id_participant) revenue_total
FROM
formation f
JOIN catalogue_formateur cf ON f.id_formation = cf.formation_id
JOIN session_formation sf ON cf.id_catalogue_formateur = sf.catalogue_formateur_id
JOIN participant p ON p.session_formation_id = sf.id_session
GROUP BY
f.id_formation
);
-- 6. Determine the share of revenue paid by the OPCOs (private clients with pre_opco)
SELECT
SUM(cost_ht * number_participants) revenue_opco
FROM (
SELECT
f.id_formation,
f.cost_ht AS cost_ht,
COUNT(p.id_participant) number_participants
FROM
participant p
JOIN employee s ON p.employee_id = s.id_employee
JOIN client c ON s.client_id = c.id_client
JOIN session_formation sf ON p.session_formation_id = sf.id_session
JOIN catalogue_formateur cf ON sf.catalogue_formateur_id = cf.id_catalogue_formateur
JOIN formation f ON cf.formation_id = f.id_formation
WHERE
c.organism_type = 1
AND c.pre_opco_float IS NOT NULL
GROUP BY
f.id_formation, f.cost_ht
);
-- 7. Determine the average revenue (€/day) of training actions by theme
SELECT
t.libelle theme,
SUM(sub.cost_ht * sub.number_participants) / SUM(sub.duration_days * sub.number_participants) average_revenue_per_day
FROM (
SELECT
f.id_formation,
f.cost_ht,
f.duration_days,
f.thematique_id,
COUNT(p.id_participant) number_participants
FROM
participant p
JOIN session_formation s ON p.session_formation_id = s.id_session
JOIN catalogue_formateur cf ON s.catalogue_formateur_id = cf.id_catalogue_formateur
JOIN formation f ON f.id_formation = cf.formation_id
GROUP BY
f.id_formation, f.cost_ht, f.duration_days, f.thematique_id
) sub
JOIN thematique t ON t.id_thematique = sub.thematique_id
GROUP BY
t.id_thematique, t.libelle;
-- 8. Determine the number of days sold
SELECT
f.duration_days * COUNT(p.id_participant) days_sold, s.id_session, f.libelle
FROM
participant p
JOIN session_formation s ON p.session_formation_id = s.id_session
JOIN catalogue_formateur cf ON cf.id_catalogue_formateur = s.catalogue_formateur_id
JOIN formation f ON f.id_formation = cf.formation_id
GROUP BY
s.id_session;
Conclusion #
It is essential to take time to think before designing the structure of our data. We must determine how to organize our data in a non-redundant way in order to optimize the database. It is important to explain the dependency relationships with cardinalities to clarify the links between the different tables. The first step is to identify what information is given and its role in the project. To test the system, we can proceed with the creation of a test set with AI, and do the same with the queries. We must then verify the results provided by the AI to ensure their accuracy. Finally, it is necessary to understand the names of the columns that will be returned in order to correctly manipulate the data.