SQL queries within queries

11 Oct 2019

Notes based on Head First SQL

Sometimes, you need to ask your database more than one question, or take the result of one query and use it as the input to another query. This is where subqueries come in, they help you make your queries more dynamic and avoid duplicate data.

SCENARIO: We want to pull out all of the developers with at least five years of experience, and who don't require a salary that is higher than 105,000.


SELECT contacts.last_name, contacts.first_name, contacts.phone
FROM contacts
  NATURAL JOIN
  job_desired
WHERE job_desired.title = 'Web Developer'
AND job_desired.salary_low < 105000;

The query above results in a table that shows the first name, last name and phone numbers of the people who desire a 'Web Developer' job AND whose lowest desired salary boundary is less than 105,000

Matching contacts with desired jobs


SELECT title
FROM job_listings
GROUP BY title
ORDER BY title;

The query above returns all of the available job titles in a table called 'job_listings'.


SELECT contacts.first_name, contacts.last_name, contacts.phone, job_current.title
FROM job_current
NATURAL JOIN my_contacts
WHERE job_current title IN ('cook', 'Hairdresser', 'Waiter', 'Web Developer');

The above query results in a table containing the first name, last name, phone number and current job title of each contact in a contacts table, as well as a column containing their desired jobs ONLY if their desired jobs are contained in a list of open job listings - which are currently hard-coded. In other words, we can see a list of contacts whose desired jobs match current job openings.

The problem with this query is that we are doing two queries at the same time, where the second query is not even a query because it is hard coded. The solution to this is as follows:

Subqueries

A subquery is a query that is wrapped within another query, it's also called an INNER query.

To accomplish what two queries can do with just one query, we can add a sub-query into our query.


SELECT some_column, another_column
FROM table
WHERE column = (SELECT column FROM table);

Because the outer query uses an '=' operator, this subquery will return a single value, one row from one column (sometimes called a cell, but in SQL is known as a scalar value, which is compared to the columns in the WHERE clause

A real example

We can call the second query that we used to get the matches from the porfessions table the OUTER query, because it will wrap up inside of itself the INNER query.


SELECT contacts.first_name, contacts.last_name, contacts.phone, job_current.title
FROM job_current
NATURAL JOIN my_contacts
WHERE job_current title IN (SELECT title FROM job_listings);

We don't need to retype all the professions from our first query anymore because the inner query (inside the parentheses on the last line) takes care of that for us.

You can use joins in the place of many sub queries, but some people find it simpler to write sub queries, so you have a choice of syntax.


SELECT mc.first_name AS firstname, mc.last_name AS lastname, mc.phone AS phone, jc.title AS jobtitle
FROM job_current AS jc
NATURAL JOIN my_contacts AS mc
WHERE jobtitle IN (SELECT title FROM job_listings);

In the query above, we have used table aliases to make it easier to write out the queries. For example, our alias for 'my_contacts' is 'mc', so that we don't have to write out the full name in the query every time we use it. We have also created column aliases, this is to make the output of our joined columns easier to understand. They columns have been given names that make sense in their original tables, but outside of that context, they can be a little unclear. Aliases are temporary names that don't affect the original data, but are used mainly for readable queries and readable query results.

Subquery rules

Constructing subqueries

The hardest part about writing subqueries is not the structure of them, but figuring out what part of the query needs to be the subquery.

Analysing queries is like figuring out word problems. You identify words in the question that match things you know, like table and column names and break things apart.

An example question: Who makes the most money out of all of my contacts

  1. Rephrase the question in terms of the tables and columns in your database. "Who" means you want a first and last name from your contacts table. "The most money" means you need a MAX value from your "Job_current" table.
  2. Identify a query that answers part of the question: The MAX(salary) looks like a good candidate for our first query. SELECT MAX(salary) FROM job_current;
  3. The first part of the query is also easy, we just need to select the first and last names: SELECT mc.first_name, mc.last_name FROM my_contacts AS mc;
  4. Finally, we figure out how to link the two. We not only need names of people in our contacts table, but also their salaries so that we can compare them to our MAX(salary). We need a natural inner join to pull out the salary belonging to each person:
  5. Finally, we need a WHERE clause to link the two.

SELECT mc.first_name, mc.last_name, jc.salary
FROM my_contacts AS mc
NATURAL JOIN job_current AS jc
WHERE jc.salary =
(SELECT MAX(jc.salary) FROM job_current jc);

We could have done this without the subquery, using a natural inner join and a LIMIT command. There are many cases in SQL where there are more than one way to do things.

A subquery as a SELECT column

If a subquery is used as a column expression in a SELECT statement, it can only return one value from one column.


SELECT mc.first_name, mc.last_name,
(SELECT state
FROM zip_code
WHERE mc.zip_code = zip_code) AS state
FROM my_contacts mc;

The query above goes through all of the rows in the contacts table. For each of those contacts, it pulls out the first name, last name and state (where we find the state by taking the zip code and matching it up with the correct state in the zip code table).


SELECT jc.salary
FROM my_contacts mc
NATURAL JOIN job_current jc
WHERE email = 'andy@weatherorama.com';

The above query returns a single result, the salary of a single person whose email address matched the email address specified. We can turn this query into a sub query that allows us to compare it to other salaries, so that we can list the contacts who earn more than him.


SELECT mc.first_name, mc.last_name, jc.salary
FROM
my_conctacts AS mc
NATURAL JOIN job_current AS jc
WHERE
jc.salary > (Single salary query goes here);

A noncorrelated subquery

If the subquery stands alone and doesn't reference anything from the outer query, it is a noncorrelated subquery.

All of the subqueries we have written so far in this post are examples of noncorrelated subqueries. The inner query gets processed first, then the result is used in the WHERE condition of the outer query. The inner query does not in any way depend on values from the outer query, it can be run as a standalone query.

Knowing what queries to run

Noncorreclated subqueries with IN and NOT IN

A noncorrelated subquery uses IN or NOT IN to test if the values returnedin the subquery are members of a set or not.


SELECT mc.first_name, mc.last_name, mc.phone, jc.title
FROM job_current AS jc NATURAL JOIN my_contacts AS mc
WHERE jc.title IN (SELECT title FROM job_listings);

IN evaluates each row of each jc.title values against the entire set returned by the subquery

Using NOT IN in this case lets you see job titles that don't match the job listings.

Correlated subqueries

A correlated subquery means that the inner query relies on the outer query before it can be resolved.


SELECT mc.first_name, mc.last_name
FROM my_contacts AS mc
WHERE
3 = (
SELECT COUNT(*) FROM contact_interest
WHERE contact_id = mc.contact_id
);

The query above counts the number of interests in the interest table for each person in my_contacts, then returns the first and last name of those people who have three interests

The subquery depends on the outer query. It needs the value for contact_id from the outer query before the inner query can be processed.


SELECT mc.first_name firstname, mc.last_name lastname, mc.email email
FROM my_contacts mc
WHERE NOT EXISTS
(SELECT * FROM job_current jc
WHERE mc.contact_id = jc.contact_id);

Just like the IN and NOT IN, you can use both EXISTS and NOT EXISTS with your subqueries. THe query above returns data from my_contacts where the contact_ids do not show up at all in the contact_interest table.