Our SQL interview questions

edw519 · on May 28, 2013

I took a similar test, on-line while being watched. 4 sets of 10 multiple choice questions: SQL, unix commands, vi, & HTML. Make your 10 choices, click submit, get your score. It was kinda silly, but what the heck...

It was ridiculously easy and I got all 40 right without much thinking, as many people here would also, I imagine.

Then I asked, "Why bother with this after reading my resume?"

They answered, "We have to do this. We've interviewed 52 programmers with resumes similar to yours and no one else got them all right. In fact, the highest score before you was 32."

Wow. Is this the state of our industry now?

T-hawk · on May 28, 2013

It's the state of the bottom of our industry. Those 52 programmers aren't a representative set of the general population. Those 52 programmers are the ones that can't catch on for any jobs so they keep applying over and over. It's a sampling bias.

And by the way, this happens for most industries, not just technology. McDonald's has the same problem. Their majority of applicants fail at tasks like having the literacy to fill out an application form or getting out of bed and showing up to work. This doesn't mean the majority of the population is that deficient, just the majority of deadweight floating around the would-be job pool.

More generally stated, the worse an applicant is for his desired job, the more times his incompetence will attend interviews to be seen. Quality performers in any business get hired quickly and don't stay in the interviewing pool. So equivalently, any interview pool will consist mostly of bad candidates.

We need a name for this effect so that we can just quote it whenever this topic comes up, like Dunning-Kruger. Anyone got a good suggestion? Joel Spolsky was the first to set it out well and become widely read[1] , but Spolsky's Law is already used for the Law of Leaky Abstractions.

[1] http://www.joelonsoftware.com/items/2005/01/27.html

stonemetal · on May 28, 2013

After looking for a new job recently, I realized the converse is also true. The worse a company is as a place to work the more often they need to hire. Therefore the pool of available jobs mostly consists of jobs no one wants. Be picky in who you hire and be picky in where you apply.

jacques_chester · on May 28, 2013

> We need a name for this effect so that we can just quote it whenever this topic comes up, like Dunning-Kruger.

Sounds a lot like the Market for Lemons:

http://en.wikipedia.org/wiki/The_Market_for_Lemons

mgkimsal · on May 28, 2013

Took a similar test at a recruiting company in ... 2006 (IIRC). Mostly on PHP. And the test was wrong. IIRC, I got 24 out of 25. They were ecstatic - "wow, no one in this office ever got such a high score - you were almost perfect"

"I was," I said. "One of those questions is wrong".

"Oh, no, it couldn't be - we have a team of experts who create these to the highest standards, blah blah blah.".

I asked again to go back, take a picture (crappy camera phone) and reviewed it again when I got home. It was wrong. Something to do with how references behaved in PHP5, and the exam's answer was right for PHP4, but the test was for PHP5.

Anyway, it was a bit depressing, and I don't want to have to go through those sorts of tests again if I don't have to.

danielweber · on May 28, 2013

"One of your questions is wrong" is an excellent filter. People should be thrilled to get that information.

I corrected two questions on a multiple-choice test that a finance firm gave me. They flushed me out after I handed in my personality test, so maybe it was for the best.

mgkimsal · on May 28, 2013

Yeah, you'd think it would be, but it's also indicative of someone who might be 'too big for their britches', or some other such nonesense. Given how many fakers I've encountered doing crap work for people (not just in software, but any service), I'm sadly inclined to realize that when I say "one of your questions is wrong", I'm very likely to be dismissed.

bediger4000 · on June 3, 2013

In the early 2003 time frame, I ended up taking a bunch of multiple choice tests about Java, C and C++. Every single one of them had things wrong. As near as I can tell, the auto-capitalization feature of Microsoft Word caused most of these problems. For example 'boolean' and 'Boolean' denote very different types in Java. In one test, they meant 'boolean' but had 'Boolean' written out.

SatvikBeri · on May 29, 2013

I would definitely be thrilled. I wonder if intentionally putting an incorrect assumption in order to see if someone brings it up would be a good indicator. The idea wouldn't be to penalize someone who didn't notice the assumption, but to reward anyone who noticed the mistake and brought it up.

Tyrannosaurs · on May 28, 2013

> Then I asked, "Why bother with this after reading my resume?"

You're aware that not everyone tells the truth on their resume right?

> Is this the state of our industry now?

Now? It's always been this bad... Happens when demand for talent out strips supply - the bar keeps falling.

VLM · on May 28, 2013

"Wow. Is this the state of our industry now?"

Traditionally this happens when HR demands "10 years experience with windows server 2007" and you somehow snuck thru (or maybe worked at MS on the dev team, or were in a beta program or...) anyway by definition the only people making it thru the filter are going to have a weird/cool/interesting background or are going to be pathological liars who inevitably will only get 32 on the 40 point test, because they filtered all the real applicants out by applying a bad filter.

I guess the SQL analogy would be something like:

SELECT COUNT(*)

FROM applicant

WHERE applicant.skill > 40

HAVING applicant.skill < 10 AND applicant.honesty < 0

chrislomax · on May 28, 2013

Without seeing any real data, wouldn't that all be a WHERE clause? I wouldn't imagine the skill was an aggregate of the applicants.

I've just skimmed over this though, I'm not having a go

VLM · on May 28, 2013

Ah I was trying to imply two sets of filtering and probably have them reversed, where HR is filtering on lack of honesty which only implies lack of skill or at least average skill, whereas the secondary filtration (the having clause) would imply the test for skill.

versk · on May 28, 2013

Don't you need a group by clause in order to have a having clause?

VLM · on May 28, 2013

This is yet another of those things that gets the pgsql folks all wound up about mysql. mysql is generally permissive best effort rather than restrictive follow the spec, so HAVING is allowed to reference stuff not in a GROUP BY or an aggregate (like MAX or COUNT). As you imply, this is not allowed by the SQL standard so philosophically I would Strongly Expect pgsql to error out unlike mysql. I donno what ms-sql does, don't use microsoft stuff. Oracle costs too much, so another I donno.

About 99% of the noise about mysql vs pgsql boils down to this overarching philosophical different of "try yer best" vs "only perfection is permissible". There are minor other differences aside from that, none of which I can remember at this time.

I was mostly trying to make a joke and making the psuedocode kinda sql inspired rather than cut and paste into a window like a stack exchange answer. I could have implemented it "properly" as a nested subquery I suppose. Or to make the point a little more .. obviously, just "select 0;"

socialist_coder · on May 28, 2013

That's messed up. "Having" should only be used for filters on the output aggregate functions, and "where" should be used for filters on the input row data. If mysql lets you use "having" when you mean "where", that is unfortunate.

example:

    select count(1) cnt, department
    from sales
    where department_id in (1, 2, 3, 4, 5)
    group by department
    having count(1) >= 100;

So, it filters out all the input rows to only those department ids, and then it filters out the aggregate output rows to only those with a count() of 100 or more.

This is how Oracle and MS-SQL server work.

genwin · on May 28, 2013

In SQL Server a HAVING without GROUP BY is a way to filter out duplicates.

jmelloy · on May 29, 2013

It seems to error out.

  select name, count(*)
  from queries
  having count(*) > 1

Column 'queries.name' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

genwin · on May 29, 2013

Like this. There is more than one way to do it:

  -- List employees who have the biggest salary
  -- in their departments
  select
      Name
  from
      Employees e1
  where
      exists
      (
          select
              1
          from
              Employees e2
          where
              e2.DepartmentID = e1.DepartmentID
          having
              max(e2.Salary) = e1.Salary
      )

acdha · on May 28, 2013

It's been like that since at least the late 90s when the spike in wages versus most other jobs both increased the financial benefits from overstating your experience and decreased the cost of having to find a new job if you were actually caught, which is surprisingly uncommon.

One major factor: companies delegate a great deal of hiring to recruiters, in part because it's expensive to maintain in-house experts. Since recruiters are usually paid by the hire rather than for sending qualified applicants there's pressure to simply use the shotgun approach of sending as many remotely plausible applicants over and hoping one of them will be hired rather than spending the non-trivial amount of time needed to find a great fit. Sometimes resumes are even altered by the recruiter to add required skills – and that probably does work well at large organizations which either don't hire well or where HR tosses every word they've used in past listings into the job description.

genwin · on May 28, 2013

I was once tasked with hiring some developers. HR used filtering software, so every candidate had C++ on their resume even though we didn't need that. Out of curiosity I asked each candidate a simple C++ question. Not one had a clue.

oneandoneis2 · on May 28, 2013

You've heard of the FizzBuzz test, right?

jebblue · on May 28, 2013

I heard about it in the last year or so here on HN, otherwise I wouldn't have known what a FizzBuzz was. I've been coding for 20 years professionally and 30 for fun. I've never coded a Fibonacci either. I think colleges need to teach how to code a microcontroller to do something, build a multi-platform application, build a database application, set up a CI server, etc.

Bognar · on May 28, 2013

We use FizzBuzz as a very basic first filter in our hiring process. It saves us a lot of interview time considering some 60% of candidates fail (or give ridiculously over-complicated implementations). Resumes are nothing but an exercise in creative writing, it seems.

twistedpair · on May 28, 2013

Fibonacci? That's a recent invention, right?

metaphorm · on May 28, 2013

not sure what you mean?

coding a simple function for evaluating the fibonacci sequence is a reasonable alternative to FizzBuzz that allows for some slightly more sophisticated requirements: like "code a recursive function that evaluates the first N members of the fib sequence. use memoization in your implementation and show how this runs in O(n) time complexity."

3825 · on May 28, 2013

I think he was joking based on the fact that the Fibonacci number is at least an eight hundred year old concept.

PeterisP · on May 28, 2013

Maybe the previous person was an emacs guy? Seriously, what would be a valid reason to sort employees based on which specific editor they are habitually using?

_pmf_ · on May 28, 2013

> Seriously, what would be a valid reason to sort employees based on which specific editor they are habitually using?

It's not for filtering out really good candidates, it's for filtering out abysmally bad candidates. Show me the dumbest Vi user and the dumbest Eclipse user you can find and you might learn something.

kenko · on May 28, 2013

And people who've never used vi are ipso facto abysmally bad?

gcr · on May 28, 2013

Ignoring ed, vi is the only sane editor that you can depend on to be installed on every system you work with. Even if you know emacs, it's common to at least be familior with enough vi basics to fix a broken system (hjkl, i, :q, :w)

PeterisP · on May 28, 2013

The original interview was claimed to be for a developer position. If you're a developer in most companies, you never have to handle a random broken system, as usually any and all deployment or production maintenance is separated. DevOps is still an exception, not the rule.

You manage your workstation and your dev/test environments or VMs at most - they have the exact editor setup you like. The only interaction between your computers and "foreign" systems is the code version control system. Your editor, no matter how rare or exotic, is guaranteed to be installed on every system you work with - if you work on your systems, not manage systems of other people.

Even the OS doesn't need to match. You can easily code for Linux deployments on a MacOS or Win machine, and never touch any Linux computers. Heck, you even can code for Windows deployments on Linux machines, though sometimes testing that may be a mess and requires a VM - but you certainly can do that.

benjiweber · on May 28, 2013

Unless you have the misfortune of having to work with windows systems :)

_pmf_ · on May 28, 2013

> And people who've never used vi are ipso facto abysmally bad?

A few false negatives might be acceptable.

kshahar · on May 28, 2013

Jeff Atwood has written about a similar experience in "Why Can't Programmers.. Program?": http://www.codinghorror.com/blog/2007/02/why-cant-programmer...

specialist · on May 28, 2013

We also set up an online test to screen candidates. Because HR and the recruiters kept sending us such terrible candidates.

I just can't understand how people don't score 100%. Alas, not many do.

It's saved us devs a HUGE amount of time.

chrislomax · on May 28, 2013

I dislike the whole interview process, I'm only young so the interviewing process is still very new to me. We find that people lie heavily on CVs to the point where it is annoying. We had someone once who had 5 years CSS experience yet found it hard to center some text.

I think an online test would help our cause some but I don't know how simple to make some of the questions. How long would you expect the candidate to be taking the test? Do you tailor the test to the candidates?

tetha · on May 29, 2013

It's dunning-kruger and lying in full swing, I feel.

Some people go like "Oh I worked 5 years in that tool, I must be magnificient in this tool. I shall list myself as highly professional in this area!" and do so.

Me... I rather go like "Hmm... Well I worked with C for 3-4 years, but I'm rusty as hell... and I didn't touch many libraries much, I just worked on tiny microcontrollers and implemented a highly multithreaded programming language execution environment. Guess I'm gonna call me <somewhat experienced> there." Though afterwards I just roll over people in an interview.

darksaints · on May 28, 2013

The positions on my team usually call for people that have a strong mix of business and technical skills. We get business analysts with programming that they've picked up along the way or programmers with MBAs. Typically, when the programmers get dropped from the process, it is when we test their SQL. Sad, but yes, this is the state of the industry.

gregors · on May 28, 2013

And yet people still ask if automated testing is needed. I heard there's a new research project being funded called "People: They make mistakes. Again. And Again."

codegeek · on May 28, 2013

"List employees (names) who have a bigger salary than their boss"

    SELECT e1.Name FROM Employees e1 LEFT OUTER JOIN     Employees e2
    ON (e1.BossID = e2.EmployeeID)
    WHERE e1.Salary > e2.Salary

"List departments that have less than 3 people in it"

    SELECT d.Name, COUNT(e.EmployeeID) FROM Department d LEFT   OUTER JOIN Employees e
    ON (d.DepartmentID = e.DepartmentID)
    GROUP BY d.Name HAVING COUNT(e.EmployeeID) < 3

"List all departments along with the total salary there"

    SELECT d.Name, SUM(e.Salary) FROM Department d INNER JOIN Employees e
    ON (d.DepartmentID = e.DepartmentID)
    GROUP BY d.Name

"List employees that don't have a boss in the same department"

    SELECT e1.Name FROM Employees e1 LEFT OUTER JOIN        Employees e2
    ON (e1.BossID = e2.EmployeeID)
    WHERE e1.DepartmentID <> e2.DepartmentID

"List all departments along with the number of people there"

    SELECT d.Name, COUNT(e.EmployeeID) FROM Department d 
    LEFT OUTER JOIN Employees e
    ON (d.DepartmentID = e.DepartmentID)
    GROUP BY d.Name

ohwp · on May 28, 2013

Curious question: why do you always use table aliases? To keep your query shorter? When I don't need an alias I just use the full table name for readability:

  SELECT
    Department.Name,
    COUNT(Employees.EmployeeID)
  FROM Department
  JOIN Employees
    ON Employees.DepartmentID = Department.DepartmentID
  GROUP BY Department.Name
  HAVING COUNT(Employees.EmployeeID) < 3

codegeek · on May 28, 2013

Good question. I always find it easier to have the aliases because sometimes, table names are too long for me to remember. Also, tehre are times when we join the same table by itself and at that point, I use x1, x2 etc. In general, aliases always work while direct table names may not work for all cases. So i just keep it simple.

jeremysmyth · on May 28, 2013

There's also the cognitive overload of reading long identifiers, multiplied by the naming conventions of some large corporate databases.

I'd rather see:

    EmployeeReferences eFrom JOIN EmployeeReferrals eTo

...and then see

    ON eFrom.ID = eTo.ID
    ...
    JOIN xyz
    ON eFrom.Source = ...

rather than have to read acres of EmployeeRe-something 4 or 5 times through an 8-table BI join.

Similarly, I've had to deal with (admittedly legacy) tablenames like A12R18SALE and A12B14PROD. Aliases come in really handy there.

dspillett · on May 28, 2013

I always use table aliases as a matter of habit:

* I always specify columns as table.column, not just column as it makes things explicit where the could be ambiguity if a less experienced coder is looking (I know that column reference in a correlated sub-query refers to the most local instance of that table, but having the table name there explicitly states that referring to that was my intention and not an accident). Having short aliases saves typing in this instance (though not too short/arbitrary - the object names should still be meaningful in the context of the query: a, b, c, d, ... would generally be bad aliases)

* If the query gets more complex and needs to join objects in that have columns of the same names as those in existing objects (especially if you add another reference to an object already in use in this query), you've already got the aliases there for the first instance reducing the chance you'll get one wrong when adding them in for both instances of the same name.

genwin · on May 28, 2013

I use to use full table names if only to avoid inconsistency with aliases in other code. Now I use aliases and ignore inconsistencies. Being a purist wasn't worth it.

jfb · on May 28, 2013

Too much typing.

ohwp · on May 29, 2013

Get a better IDE ;)

twistedpair · on May 28, 2013

You assume that all people have bosses. The top boss has no boss. You also assume that all department names are unique.

Nitpicking, yes, but these questions certainly allow for a deeper discussion with the interviewer.

brown9-2 · on May 28, 2013

You assume that all people have bosses. The top boss has no boss.

Isn't that why a left outer join is used?

Besides, if you are referring to question #1, a person without a boss can never be a part of the set of employees who have a higher salary than their boss.

genwin · on May 28, 2013

Two assumptions to be clarified in the interview: 4th question above doesn't apply to the top boss (who can't have a greater salary than a boss he/she doesn't have, so no issue on the 1st question). Department table has an alternate key on department name.

shawabawa3 · on May 28, 2013

> people often do an "inner join" leaving out empty departments

Empty departments have less than 3 people

codegeek · on May 28, 2013

correct. Edited.

3pt14159 · on May 28, 2013

I was following along (without peeking ahead) and I briefly thought "What about NULLs and empty joins?" But I figured, it is an idealized test. For example, what happens when a boss has a NULL department id? Would it be safe to say that they are in a different department than their underling? SQL says no.

Besides that, I think this is a great test. Personally, I start off a bit slower so I don't embarrass people that don't know SQL.

pluies · on May 28, 2013

My own answers, with test data: https://gist.github.com/Pluies/5663135

Despite thinking I knew SQL reasonably well, I wouldn't have fared very well at all in an interview setting. :/ Took more time and googling than expected.

codegeek · on May 28, 2013

didnt google but I had to create some example data to think about the solutions. So yes, i would not be able to do it in 5 min.

genwin · on May 28, 2013

I'd give someone props for getting these questions all correct in half an hour, even if some could do it sooner.

slc · on May 28, 2013

Probably not the best idea to group on the Departments.Name instead of Departments.DepartmentId ..

genwin · on May 28, 2013

On the first and fourth queries you don't need LEFT JOIN because the WHERE clause guarantees that only INNER JOIN rows will match. (Presumably neither question applies to the top person.)

On the third one you don't list all departments; the empty ones are filtered out. Needs a LEFT JOIN.

btilly · on May 28, 2013

The question List all departments along with the number of people there has an answer using a correlated subquery, rather than a join.

I have a relevant story about that.

About 9 years ago now, another developer escalated a bug to me. Every time they ran a complicated auto-generated query, they got logged out of Oracle. No way! I tried it. Happened to me. Began trying to narrow it down. Ran out of connections. Got a DBA to unwedge the machine. Began again. Ran out of connections again. Got the same DBA to unwedge the machine. Received a lecture about not opening so many connections, replied that I was tracking down a bug and had no choice. Showed him the bug. He was astounded.

Not long after I finished tracking it down and sent them the fix. Showed it to the DBA who verified that it had been reported already, and was fixed in the next release.

The bug was that any time there was a correlated subquery with no records, Oracle logged you out. My guess is that something, somewhere, followed a null pointer. The obvious solution was to move to a left join. If I remember correctly, the way it was being autogenerated made that hard. My solution was to have a correlated subquery which was a left join on DUAL so that there was always a record.

krsunny · on May 28, 2013

Relevant story? You're hired!

aidos · on May 28, 2013

These questions are very simple, though I guess they cover a few of the core concepts. Basic selects, joins, joining the same table twice, left joins and group by.

I'm most worried by the comment "(tricky - people often do an "inner join" leaving out empty departments)". That's a basic question and if that's considered "tricky" you've got a real problem on your hands.

Maybe if you're hiring for a junior position you could excuse someone not knowing about left joins. If it were for a position that had any sort of focus on db work I would pass on the candidate (caveat, when hiring juniors I look for desire to learn above most everything else).

Obviously I'm getting old. "Back in my day" a basic understanding of SQL was just part of the job. Didn't matter what you worked on - you should be able to work with relational database. I'm concerned that the attitude of "I don't need to know that - my ORM does that for me" has become the default outlook. Over the last few years I've had to convince developers several times that the complex aggregation they're writing in their script would be easiest solved by using SQL. Unfortunately, increasingly it seems that newer developers aren't even aware that these tools are available - or how to use them.

If nothing else, relational algebra is a wonderful and elegant subject that is worth learning.

Darn kids, get off my lawn! :)

wmil · on May 28, 2013

> That's a basic question and if that's considered "tricky" you've got a real problem on your hands.

If they aren't warned then it's reasonable to assume that every department has employees. Otherwise why would it exist?

aidos · on May 28, 2013

Fair point. Experience has taught me to internally question whether or not each relationship should use a left join or not. The fact that it says "List all departments" made me think it should be a left join. In fact, to me that screams "left join".

Really, in a philosophical sort of way, it's the use of the query that determines which join to use. They want ALL departments, you give them ALL departments. Using a left join you protect yourself in the future - with a performance hit. As you say, if there was a guarantee that the data didn't contain empty departments you'd use an inner join.

Most importantly, a reasonable developer should know to ask the question - if not of the examiner, at least to themselves. Just making an assumption is not the right approach.

dspillett · on May 28, 2013

It is a good way to expose real world experience.

Client requirements are usually vague enough (heck, internal requirements are sometimes vague enough) for there to be problems like this discovered later. or a given business it might be valid for there to be departments that are empty at a particular time.

Someone with extensive real world experience will know to never assume any detail not included in the specification no matter how much of a non-brainer that assumption might seem at the time. They will either ask if empty sets need to be considered, or they will preface/suffix their answer with "assuming there are no empty departments or you don't want to report on them if there are" or "assuming there might be empty departments and you want them included in the report" - either way they are showing an ability to parse requirements and identify possible ambiguity that should to be queried.

You've got a multi-level differentiator there:

* The bad candidates will not be able to give a working answer

* The fine candidates will give a working answer though might miss the exact requirement (as it isn't properly stated and they just assume)

* The best candidates will spot the deficiency in the spec

Nick_C · on May 29, 2013

I can think of two business reasons straight off: 1) a department changed its name, or 2) the department used to have employees but is now defunct, such as the Typing Pool department.

My point is, to try to be aware of the assumptions that you are otherwise unconsciously putting in to your program's business logic.

zhodge · on May 28, 2013

I'm an aspiring junior developer, and I found Stanford's (online) "Introduction to Databases" course to be excellent. I particularly enjoyed the relational algebra section where we wrote relational algebra queries by hand.

From the newbie's perspective, sometimes it can feel as though beneficial resources on the basics of subjects like relational algebra, SQL, and relational databases are somewhat sparse.

Unfortunately, I did not have enough time to finish the class completely (it required, as it should, a significant amount of time dedicated each week), but I'll definitely be re-enrolling next semester as long as it is offered online again.

dirktheman · on May 28, 2013

I feel exactly the same way. I'm the first one to admit that an ORM can save you a huge amount of time for basic database stuff. However, as queries get more complex I always write them in raw SQL. It's just faster for me. I'm shocked that someone who applies for a programming job wouldn't get these questions right. They're not exactly rocket science, right?

SkippyZA · on May 28, 2013

I have just been on the job market looking for a senior PHP position. There were so many companies that requested tests from me. Either in the form of online tests, or tasks which I had to complete and return. While I do understand the need for them (having had to hire other developers), some of the requests were quite outrageous.

1 particular company basically wanted an entire application to be developed in an evening, and I was giving strict instructions to focus on security and not allowed to use external libraries. After submitting this elaborate task, I was still criticized on using PDO (which is standard with PHP...).

IMHO, sometimes the lengths employees go through to find a developer are so ridiculous that they actually drive away people.

bluedino · on May 28, 2013

I recently took an SQL skill assessment test from one of the big 'testing' sites. My first problem with the test was that it was a mix of Oracle and MS SQL, when my resume said 'MySQL'. And there were questions such as 'What is the MS SQL equivalent to the Oracle keyword xxxx?' Luckily I've used it enough to not bomb that portion. To be expected with a recruiter...

Anyway, some of the other questions were pretty silly like "Which of the following is a DDL command?", and many were SELECT statements with a syntax error that you had to pick out, and probably the one question that made sense was about the difference between WHERE and HAVING.

Gravityloss · on May 28, 2013

If their recruiting is so incompetent, maybe the company is clueless otherwise as well?

gfosco · on May 28, 2013

This is a great way to start... A defined schema and very clear and simple requirements, I could dash these off really quickly, and then we could discuss optimization. So much better than asking me to define a join. I miss SQL. http://stackoverflow.com/search?tab=votes&q=user%3a35398...

Tyrannosaurs · on May 28, 2013

Seems to me to be a solid test, though I might include a small amount of sample data to nudge them in the direction of some of the potential issues (empty departments and so on) - maybe I'm just kind like that. ;-)

In interviews I've always been amazed how few people who claim to know SQL can use GROUP BY, HAVING and aggregate functions (or depending on the question self joins or sub queries that will allow them to achieve some of the same things).

My normal question is to present them a table with a level of duplication ask them to write something in SQL that identifies the duplications and something that removes them which covers much of the same syntax.

jcampbell1 · on May 28, 2013

Removing duplicates is dependent on the underlying database because the behavior of deleting while selecting varies. I know MySQL doesn't allow it in most cases, so you need a temporary table. Identifying them is a fine question.

VLM · on May 28, 2013

You can get a list of dupes to eat without a temp table with something like:

select min(id), count(*) as dupecount from yer_table group by some_hash_identifier_or_whatever having dupecount > 1

And then just iterate thru delete from yer_table where the id = the min(id) as fetched above.

Or maybe your business logic is to keep the oldest record and zap the newest. Or based on some column data rather than simply age.

Now the really interesting discussion is how often this happens (like once-off, or every 10 seconds, or), and how scalable you need it to be. Are you talking about 100 records or 100e6 records. Also literal duplicates as in "two" works pretty well but not so good if there's 50 duplicates and you need to delete 49 of them. Of course for 49 duplicates you could select the identifier hash and the lone lowest ID and delete all entries with the same identifier hash where the id isn't the lowest id for that hash...

Tyrannosaurs · on May 28, 2013

I give a basic single table schema with sample data so they have all the information they need to work with.

megaman821 · on May 28, 2013

If you can only answer these using an ORM, then you really shouldn't put knowledge of SQL on your resume.

I am not really into giving programming tests to interviewees, but if you claim to have experience with something you should be able to answer simple questions about it.

jfb · on May 28, 2013

I wouldn't treat this sort of thing as dispositive, and if I were doing hard-core SQL development, I'd dismiss it entirely and start the interview with much hairier wizardry; but for a generic, gonna write some queries but mostly live outside the database kind of role, the five minutes or so this sort of test takes at the beginning of the interview gives me a strong indicator of how to assess what the candidate actually knows, rather than what is represented on their resume. It is a guide for the actual meat of the interview.

david927 · on May 28, 2013

These are nice, but you need simpler questions.

Hear me out: Take away two questions (the first four are enough anyway) and add two to start out with that are much simpler. You will be shocked how many people will fall out at that level.

Years ago, when I first started hiring, a friend told me about this and I didn't believe him, but I tried it anyway. I was astounded how many people were completely bluffing. It helps expedite the whole process.

lkrubner · on May 28, 2013

This comment was very good:

"This is exactly why reliance upon ORMs has had a huge negative impact on engineering. Most of these are easily solved with Group By, Having, and/or other aggregate functions, but the ORMs have created this veil of complexity."

If ORMs really simplified the underlying complexity, so I didn't have to think about it, then ORMs might be worth it, but I have never worked on a large project where, at some point, I was wholly free of the underlying technology. If its a project that I work on for a year or more, there is always some moment when I need to drop down to SQL.

300bps · on May 28, 2013

This is much more useful than the typical question I've received at some companies - "On a scale of 1 to 10, how would you rate your SQL knowledge?"

mgkimsal · on May 28, 2013

I've had this and have answered something like:

"Compared to most other devs I work with, 7 or sometimes 8. Compared to people who do SQL for a living, possibly a 4, or a 5 if I'm feeling cocky. It all depends on what the '10' really represents - the best of the best, or the best of the people I'd be working with."

Well, something like that. That last bit - I've said it more tactfully in the past.

jfb · on May 28, 2013

This is the only way to answer this question.

mgkimsal · on May 28, 2013

Thanks :)

Said wrongly, it implies that the company has crap people working at it, and that's generally not a way to get hired. But most companies know they're not getting the 'best of the best' when they hire - they're getting the best they can afford in their geographic area during a certain time frame.

packetslave · on May 28, 2013

When I was interviewing, I was told to think carefully about my answers to the "rate yourself 1-10 on your Python/Java knowledge" pre-screen questions.

If you answered 10, you just might find Guido or Josh Bloch on your interview panel.

brown9-2 · on May 28, 2013

This seems like a great sign of the overall condition of the company, or at least their hiring practices.

Has there ever been a great company or great interviewer who would seriously ask this question?

darksaints · on May 28, 2013

Everybody answers 8.

stiff · on May 28, 2013

Somewhat surprisingly most web developers I know know very little SQL, having picked it up exclusively by tinkering to get things done any way whatsoever, even if clumsy or slow. In fact SQL might look deceptively simple at times, at one point I read an ANSI SQL book so I already had some formal education in SQL when I started doing webdev, but I only really learnt ANSI SQL at the university in the databases course, and then I still had to do more learning about many details of my DB server of choice (postgres), including things like spatial queries and indexes, full-text search etc., you can get huge speed ups and infrastructure simplifications by putting those kinds of things directly in the DB.

Ask people about difference between LEFT JOIN and RIGHT JOIN, or using the schema from the article, to select all attributes of employees with the highest salary in their department in pure SQL and you will see how much or how little people know, in fact many webdevs don't even understand JOINs at all!

arry · on May 28, 2013

Which book would you recommend for learning this stuff? I'm not very interested in 800-p. gorillas; there surely must be something short, not too theoretical, and to the point.

poxrud · on May 28, 2013

I enjoyed SQL Antipatterns by Bill Karwin. Very easy to read and offers some practical approaches to common issues.

qohen · on May 29, 2013

Bill Karwin's "SQL Anti-Patterns Strike Back" presentation on Slideshare.com presentation is worth checking out -- it's 250 slides long, covers 4 kinds of anti-patterns (in queries, DB creation and the design of both DBs and applications). And he goes through actual code examples.

Check it out here:

http://www.slideshare.net/billkarwin/sql-antipatterns-strike...

jacques_chester · on May 28, 2013

A little theory goes a long way in SQL.

Pick books focusing to SQL, not on databases overall.

Most database textbooks (Date, Ramakrishnan & Gehrke) cover a lot of ground, rather than just the language.

jackmaney · on May 28, 2013

I'd recommend Head First SQL (http://www.headfirstlabs.com/books/hfsql/). I started on that book with no real coding knowledge whatsoever (except a long-forgotten Java class that I took circa 1996). Plugging away on that book a few hours a week for a few months taught me enough SQL to get my foot in the door to a new career.

mkoble11 · on May 28, 2013

Ben Forta's Teach Yourself SQL in 10 minutes http://www.forta.com/books/0672336073/

brixon · on May 28, 2013

My first weed out question is asking them to describe a Left Outer Join. They don't have to get it exactly right, I just want to see if they ever did anything more than a two table inner join.

For a Web Developer the first weed out question is to tell me the difference between a GET and an POST. Here all I really want then to know is that a GET is what generally see in the URL and a POST is commonly what you see in HTML Forms. I want to see if all they ever did was ASP.NET WYSIWYG web development or if they actually know something about the internet.

These two questions can be done in a phone screen. The faster that you can weed out people the cheaper the hiring process is.

jasonkester · on May 28, 2013

Careful using that terminology. You'll get false negatives on people who still think in terms of *= syntax, and will tell you that the words "left", "outer", and "join" are all redundant and probably don't belong in SQL in the first place.

I used to be one of those guys, but I'm much less grumpy about it these days so I'd still pass your test. I have a sad suspicion that I'm on the progressive end of the spectrum when it comes to guys who deeply understand SQL.

ajuc · on May 28, 2013

Yep, I've done a lot of PL/SQL programming, and we've always used (+)= syntax and joins with all the tables after commas and all the joining conditions after WHERE.

Now I work on different project and we use join syntax, but I could easily imagine people that do joins all day, and not know JOIN .. ON .. syntax.

jacques_chester · on May 28, 2013

I came to Oracle after it adopted the ANSI syntax, so that's what I use. So my experience is the opposite of yours -- when I see the (+) I need to look up the syntax to remember if it's left or right outer.

jfb · on May 28, 2013

And then I ask, hey, where's my BOOLEAN? And then I drink.

jacques_chester · on May 28, 2013

Oh god.

And the lack of a serial/autoincrement/identity type.

So. Many. Effing. Triggers.

And 32-character identifiers.

sigh

tmzt · on May 28, 2013

It's so much better to use a database that only allows one autoincrementing value per record, or one TIMESTAMP and then only allows you to have either a create timestamp or an update timestamp without writing a trigger.

I prefer the way Oracle does it, you may have to do more work but it's more explicit and flexible that way.

jacques_chester · on May 28, 2013

I don't. I prefer for the common case to be correctly and automatically handled for me.

brixon · on May 28, 2013

True, I use it for a conversation starter. You can tell the split second after you ask the question if you will get any kind of reasonable answer.

I personally have used the *= syntax more than the LEFT syntax, but that concept has helped filter people since I am not allowed to "test" people.

tmzt · on May 28, 2013

Obviously a GET has variables and a POST doesn't (that's why they are called "GET variables")

[I've encountered this belief more than once working with PHP developers. I would hope that the answer was closer to something demonstrating knowledge of HTTP as a protocol.]

zerr · on May 28, 2013

Good point - that they don't have to get it exactly right. For me, it would be enough if the candidate is just aware that there are different types of Joins for different scenarios.

Ecio78 · on May 28, 2013

Interesting, so I think I'll pass your first screening even though I'm not a (web) developer but (most of all) a system/network engineer :D

xanadohnt · on May 28, 2013

If the position you're filling is directly dependent on more-than-average SQL experience - creating a DB driver, an ORM, for ex. - then SQL-specific questions are applicable. But, by and large, this type of specific-knowledge testing is not very useful. I want to see a developer's general abilities at problem solving and the source code to back it up. If you have solved complex problems in C# - and can prove it - then you certainly as hell can solve complex problems in Go despite not having any experience there yet. Sure, if I'm trying to fill a Go position and someone has proof they're an excellent developer _and_ it's in Go then they'll get top consideration.

Being able to write SQL queries from memory has little correlation to a candidate's level of ability. Personally I consider myself a fairly strong developer and it hasn't been only until the last year that I can now write pretty complex joins from memory. And I've been developing for 20 years. Only because of a recent project and the volume of queries I had to write did my method change from using a graphical query writer to simply memorizing the syntax I need. Indeed, this very type of adaptation is something I look for in candidates.

jfb · on May 28, 2013

If you can't handle these sorts of queries in a forgiving interview format, then by definition you are not a strong developer in SQL. That is not to say that you are not a strong developer in general; or that you couldn't handle a job where you had to interact with a SQL datastore; merely that the interviewer is not going to be able to talk SQL with you.

joshyeager · on May 28, 2013

Since SQL is a declarative language, you have to use it very differently than procedural or functional languages. I've met many people who don't understand the difference and just write procedural code in their SQL queries (cursors, etc). The output may be correct, but performance on anything bigger than a toy dataset is terrible.

If your team builds massively parallel systems in Erlang, you need to make sure a candidate understands at least the basics of its process model and message passing. If your team build high-performance web apps, you need to make sure a candidate understands at least the basics of HTTP and the difference between client and server. For SQL, the same is true: they need to understand at least the basics of the relational model and declarative programming.

pessimizer · on May 28, 2013

These are very simple, though. If you say that you know SQL (or have known SQL, but are a bit rusty) and you can't at least make a strong showing on these questions, you are lying.

This is not complex problem solving, this is problem solving. I'm not sure what memorization has to do with any of this. SQL is a language with only maybe 10-15 important words. It's not like trying to get around Paris without a phrasebook as a non-French speaker.

I've found that using graphical query writers (I only know of the one in access) and ORMs are generally harder than writing the SQL. ORMs are good for keeping code db agnostic, though.

xanadohnt · on May 28, 2013

It's called Query by Example and there are quite a few DB management GUIs that have it. I found it pretty intuitive for setting up joins by right-clicking join lines and setting the particular properties of that join. Of course now that I have committed all types of joins to memory it's far faster to write them simply by hand. But my lacking that memorization 10 months ago wasn't really insite into my ability (or lack thereof) as a developer.

nahname · on May 28, 2013

Even the worst developers I have worked with could figure out SQL queries like this in an afternoon. If they study before the test, congratulations, we just hired a dud.

xanadohnt · on May 28, 2013

You made my point much more succinctly than I. Mainly, this type of candidate testing tells you very little (positively or negatively) whether they're strong developers. Consider an OS developer - the person may have never EVER been exposed to SQL beyond "select * from table" but is probably superior to any candidate that may ace this little SQL test. Mrs. OS developer can learn any amount of SQL syntax (and deep understanding of it as @joshyeager points out) within weeks of being hired.

brown9-2 · on May 28, 2013

I think you are assuming that the company is using SQL interview questions to test whether or not the candidate is a strong overall developer than a strong SQL developer. It seems to me that they are testing for a very specific skill and area.

VLM · on May 28, 2013

I like the little schema, it flows right into more advanced discussion about how you'd deploy indexes based on the design and queries, how you'd expand the schema in normalized form into supporting an office building seating assignment for each employee, or even multi-sites for employees using a many-many table.

One thing I didn't get was one comment on the article that a guy could struggle thru this with phpmyadmin but not at the console. Maybe he was kidding or trolling. I recently install phpmyadmin to fool around and I can't imagine talking about using it, you'd have to click like fifty thousand times just to implement just a simple query and it would probably take 15 minutes, yet not reduce the cognitive load at all. How do you talk about GUIs in an interview? "Click on the icon of the fornicating centipedes, then on the cthulhu icon, then in the ribbon select the turtle crossing street sign" It makes talking about regex's seem humane in comparison.

tmzt · on May 28, 2013

I've switched to Chive DB (chive-project.com) precisely to get away from the inanities of the PHPMyAdmin interface, it's much easier to just enter the SQL. (Working on a Chromebook so not using a native application for this.)

kamaal · on May 28, 2013

There is a very simple way of testing SQL knowledge. And you don't need any of this online tests or white board programming stuff.

Build your self a small sqlite database. Nothing much, but sufficient enough to test the candidates ability write queries. Give him a manual. No internet connection and now give him problems(a few select queries, joins, inserts and may be a few tests here and there to test how good the guy is in schema design). If the guy can write queries after reading the documentation, then hire him.

If he can't write queries, I mean practically on the computer and show you results he is not of much use. Even if he can answer all your white board answers.

This is applicable to any programming interview. If a person can read documentation well and find his way to write a program to solve a problem such a person makes a good hire.

acjohnson55 · on May 28, 2013

I could nail those with the Django ORM, but I'd struggle to write syntactically correct SQL, not having done it in a while. But it says that your test machine has MS-SQL; with the machine in front of me, I could probably puzzle it out the join quirks with a couple minutes of trial and error.

jfb · on May 28, 2013

So the question is, do you represent yourself as a SQL developer? If not, no drama.

Tyrannosaurs · on May 28, 2013

I think if you don't give someone a machine you should be pretty forgiving about exact syntax.

Someone who can put together a statement that's broadly right with small errors normally means someone who is rusty (or nervous) but knows their stuff rather than someone who is guessing and a couple of follow up questions will usually confirm that.

General rule for me: don't ask anything that a decent IDE or 10 seconds with a manual / help file / Google would have prevented (unless you've given them a decent ID or similar in which case it's fair game).

tomp · on May 28, 2013

How would you solve the first one using Django ORM?

dalore · on May 28, 2013

Employee.objects.filter(boss__salary__lte=F('salary'))

Find me employee objects which have a boss salary less than or equal to the salary.

PeterisP · on May 28, 2013

Out of curiosity, would the ORM map to the same SQL query? Or would it request all employee-boss pairs and filter them outside of the DB?

There's a huge performance difference involved.

freework · on May 28, 2013

Potentially huge performance difference.

For the 99% use case, the performance hit fo the ORM is not significant enough to matter. Most projects have many tables, but only one table that actually needs to have any speed optimizations. That one table can go in NoSQL and the rest can be handled by a ORM.

metaphorm · on May 28, 2013

this is a good question. my experience with the django ORM is that in general it resolves any call to an ORM method into SQL and nothing else, but it is not guaranteed to generate the most optimal SQL.

It is however, a somewhat common practice of django devs to to do some post-processing on a queryset in Python. totally acceptable for small querysets with complicated logic, but, yeah, obviously unacceptable for large performance critical queries.

dalore · on May 30, 2013

It will only be one query. The django orm won't do any filtering outside the DB that is up to you.

damon_c · on May 28, 2013

I believe Django would do this as a single query.

jschmitz28 · on May 28, 2013

It does not apply to the first question, but how does the ORM handle joined columns from other tables? Is it embedded somewhere in the returned object? The best way I could think to handle this using an ORM would be for the object to contain a set of arbitrary key/value pairs to contain joined columns but it seems like a hack.

megaman821 · on May 28, 2013

It is fetched dynamically e.g. employee.boss.department. That will usually cause two more queries to the database. You can use select_related and prefetch_related so that these objects will be loaded into the ORM in one or two queries.

danielsamuels · on May 28, 2013

  from .models import Employee
  from django.db.models import F

  print Employee.objects.filter(
      salary__gt=F('boss__salary')
  )

megaman821 · on May 28, 2013

    Employee.objects.filter(boss__salary__lte=F('salary')).values_list('names', flat=True)

pramodliv1 · on May 28, 2013

Nice set of questions.

I would also use some sample real world data to check if my queries scale well instead of inserting 10 rows and running all queries on it.

http://stackoverflow.com/questions/57068/good-databases-with...

jcampbell1 · on May 28, 2013

What is the preferred way to aggregate with nulls?

    SELECT  Departments.name, SUM(COALESCE(salary,0))
    FROM Departments 
    LEFT JOIN employees USING departmentID
    GROUP BY 1

The above is how I would solve the last one, but I often feel like I abuse COALESCE.

a8m88 · on May 28, 2013

Using COALESCE here is correct, but sum() returns NULL if there were no non-NULL inputs, so COALESCE should go around it:

COALESCE(sum(salary), 0)

zrail · on May 28, 2013

Aggregates generally do the right thing with null without the coalesce.

jcampbell1 · on May 28, 2013

Thanks. It appears I need to stop overusing coalesce. I was told that sql NULL means "A value that is not yet known", which nicely explains why 1+NULL, 1 < NULL, 1 > NULL, 1 = NULL is always NULL. Now I know that AVG(test_scores) produces the average of the known values automatically.

- - -

I just did a test, and it appears the COALESCE is needed in this case. Running an aggregate where all values are null, results in NULL (the empty department). You need to do something because the total salary of an empty department is known to be zero.

dragonwriter · on May 28, 2013

> Aggregates generally do the right thing with null without the coalesce.

Aggregates generally do the most-likely-to-be-right thing with NULL values if there is at least one non-null input to the aggregate. The thing is, if you depend on this, you'll run into real data situations where all the inputs are NULL, the result is NULL, and that's not what you expected.

If you are aggregating over an expression that can be NULL, and you always want a non-NULL answer, you probably need to use coalesce or something similar so that you don't have non-NULL inputs to the aggregate.

mhd · on May 28, 2013

A sample SQL file with dummy data would be much appreciated, I think. I guess some of us would like to try their luck.

At least this should be quicker in an interview situation than the usual "normalize this database" or "given this situation, define a database schema".

arb99 · on May 28, 2013

I made the tables quick to go through them. It could do with more data really but its got a few rows.

https://gist.github.com/abkr/5662615

petepete · on May 28, 2013

I had the same idea, but in an SQL Fiddle

http://sqlfiddle.com/#!1/778bb/5

mhd · on May 28, 2013

Didn't even know that there was such a thing as SQL Fiddle. That's going to come in handy, beyond this quiz, thanks.

lostsock · on May 28, 2013

In the same vain here are some very quickly put together answers if anyone is interested. I haven't tested these so they may not work. If you see any errors please let me know :)

https://gist.github.com/dritterman/5662750

numbsafari · on May 28, 2013

I like this set of questions and the simplicity of it. I think I would only add one or two simple questions about INSERT, UPDATE and DELETE statements.

Writing a bad SELECT doesn't tend to have the same ramifications as an incorrect DELETE or UPDATE statement.

nahname · on May 28, 2013

Anyone else bothered when primary keys are not given the name 'ID'?

jfb · on May 28, 2013

I'm bothered by primary keys full stop.

nahname · on May 28, 2013

What's wrong with identity?

VLM · on May 28, 2013

A concrete example is some folks like the abstraction of a row as your primary "thing" and some folks like the abstraction that the data defines a row and rows don't really exist just the data.

Consider the hated multiple primary key situation where you've got a autoincrementing prikey and a "real" key where you make an unique index off "full name" or something. So which is the real conceptual primary key? Shouldn't you use the full name as the "primary key"?

Problem: What if the business logic of what a distinct user is changes from unique "full name" to unique "full name" and "telephone number". Whoops now all your foreign keys need messing with, its just a bad scene. Ditto schema changes like you finally change from ascii to utf8 or something, now all your foreign keys need changing (well thats maybe a bad example unless your ascii datatype enforces 7 bits or you're running into byte length vs character length limits...) Or you change the length, which changes the truncation perhaps, which changes your foreign keys. Also you can't just use a rule like all foreign keys are BIGINT now some are CHAR(20) some are FLOAT who knows.

On the other hand lets say you implement just a prikey. Now you can have multiple rows with the same data, because you never set up a UNIQUE INDEX.

Generally speaking if you KNOW absolutely KNOW that your schema will never change, you should probably optimize it to not have multiple keys aka a primary key and unique indexes, or data definition will never change. Very few people can guarantee it so they're better off in the real world with imaginary prikeys.

You can read a lot more about this in "SQL antipatterns" I think chapter 4 or so, but always keep in mind that beyond noob level of being able to define the overall issue, short term snapshots will occasionally (but not always) conflict with longer term thinking.

dragonwriter · on May 28, 2013

> Consider the hated multiple primary key situation where you've got a autoincrementing prikey and a "real" key where you make an unique index off "full name" or something. So which is the real conceptual primary key?

If the full name is a real conceptual primary key, you shouldn't have introduced an autoincrement key. If the uniqueness of the fullname is a business rule but not a real conceptual restriction (a distinction which can be hard to make, to be sure), then it makes sense to create the autoincrement key -- and it is the only real primary key. (That is, the autoincrement key represents the concept of identity which isn't present in any of the other data.)

> On the other hand lets say you implement just a prikey. Now you can have multiple rows with the same data, because you never set up a UNIQUE INDEX.

No, you can't, because the "prikey", as you call it, is data, and has meaning -- specifically, it represents identity -- so rows which differ in it do not have "the same data".

VLM · on May 28, 2013

The business concept of the "real conceptual primary key" can change, perhaps dramatically, over time, as the business model changes. A real prikey never changes.

dragonwriter · on May 28, 2013

> The business concept of the "real conceptual primary key" can change, perhaps dramatically, over time, as the business model changes. A real prikey never changes.

This confuses two different concepts:

If the conceptual model changes, then, yes, the candidate keys (including primary keys) of entities may change between the old model and the new model. This can be a pragmatic difficulty in migrating between different conceptual models, but that's a problem inherent in different conceptual models.

The value of a well-chosen primary key of an entity within any given model should not change, as the primary key should always be a value which identifies the entity such that a different primary key means a different identity.

jfb · on May 28, 2013

ON UPDATE CASCADE is in the standard.

jfb · on May 28, 2013

Designating a candidate key as primary is a nasty SQL implementation wart.

lucb1e · on May 28, 2013

Liked them; not too hard but also not too easy. I'd have succeeded on the interview if I had been given the chance to test them (and if I wasn't too nervous about it I guess). Never had an interview with technical questions like this before; are you commonly given a chance to test them?

My database and answers dump (Warning: spoilers!) http://pastebin.com/HGBpemHn

slc · on May 28, 2013

The second question is actually trickier than one might think.

The obvious answer - something like

select Name, MAX(Salary) from Employees group by DepartmentId

is wrong.

twistedpair · on May 28, 2013

Call me foolish, but what about the following makes it undesirable? The question didn't ask about a null case of a department with no employees.

-- List employees who have the biggest salary in their departments SELECT em.EmployeeID, em.departmentId, MAX(salary) as salary FROM employees em GROUP BY em.departmentId

slc · on May 28, 2013

In your example, for each row of the result set

* "em.departmentId" will contain one of the distinct values from the "departmentId" column

* "salary" will contain the maximum value of the "salary" column of the table rows whose "departmentId" equals "em.departmentId" of the given result set row.

* "em.EmployeeID" will contain the value of the "EmployeeID" column of one the table rows, whose "departmentID" equals "em.departmentId" of the given result set row, but it is UNDEFINED which one. It IS NOT quaranteed to be the one whose "salary" column equals "MAX(salary)".

See here for examples of how to achieve what is actually needed: http://dev.mysql.com/doc/refman/5.0/en/example-maximum-colum...

As I said, tricky, and, judging from the difficulty level of the other questions, I suspect that the authors of the article have fallen for it themselves.

twistedpair · on May 28, 2013

Thanks for the clarification. 5/6 and dunce hat for me :)

brown9-2 · on May 28, 2013

In an interview, it would be wise to mention the special cases that might exist and how you would alter your answer if you had to taken them into account, rather than waiting to be told of the special cases.

icedog · on May 28, 2013

I was wondering if the author realized this. Using Microsoft SQL Server, the answer would be to use a window function like row_number or dense_rank and then select where row = 1.

pikewood · on May 28, 2013

One initial step I use is to give a description of the problem, then have them actually design the table structure themselves. I then have a printed copy of the structure along with some sample data ready, so that I can refer to specific values (and also to help them visualize the contents).

codeka · on May 28, 2013

I find it kind of interesting that the first comment on the article says they wouldn't be able to answer the questions by they use an ORM. I'm not really a fan of ORMs, personally, because I don't think it's useful to try and "map" a relational model onto an object-oriented one.

lucian1900 · on May 28, 2013

There are some really good SQL libraries out there that make it easy to compose SQL and some of them also include an ORM built on top of the basic abstraction.

SQLAlchemy is a great example of this, I always get precisely the SQL query I would've written myself, except it's syntactically correct, easy to compose and has a chance of being portable.

EGreg · on May 28, 2013

It's an interesting debate. While I also feel that developers should know the underlying SQL, however all that stuff like joins, indexes etc. are actually very hard to scale beyond one machine. MySQL cluster does attempt to do it automatically, but even it has limits, and places most stuff in memory. In short, if I was looking for developers to do sharding, I would actually prefer to AVOID queries with joins, non-pk lookups etc.

Having said that, I have discovered a heuristic over the years: that if you are using an ORM, you probably don't want a relational database. You should learn something like Riak and let it handle the distribution and provide all the partitioning and availability for you. The CAP theorem shows that you can't get it all, and most likely you want to use one of those data stores instead of a relational one.

For regular sites that won't have millions of users constantly using it, though, a relational db is fine.

acdha · on May 28, 2013

“that if you are using an ORM, you probably don't want a relational database. You should learn something like Riak and let it handle the distribution and provide all the partitioning and availability for you”

These are not the same concept: a relational database makes sense when your application relies on relations between records. If you need to do lots of joins across many records, Riak is going to perform horribly because it's designed for a different problem.

CAP says nothing whatsoever about whether you want a relational or non-relational database, merely what tradeoffs you'll have to make to satisfy your business needs.

Using an ORM doesn't factor into this discussion at all other than for providing a convenient place to implement whatever system you devise to meet those needs.

EGreg · on May 28, 2013

As I said, it's a heuristic. If you find that you are telling your developers to use your ORM, then you probably should have gone with a NoSQL database like Riak. You can still do joins, etc. but it's in the context of things like map-reduce, and it makes sure that you can scale despite the joins.

MySQL way: SELECT * FROM a JOIN b ON x WHERE y

NoSQL way: 1) SELECT * FROM a WHERE x 2) Perform join in app layer or stored procedure.

Like it or not, when you scale you will lose one of the CAP, and NoSQL databases do the hard task of delivering an eventually consistent data store to you and letting you express yourself in the RIGHT context, which is not SQL.

acdha · on May 28, 2013

You're still conflating different parts of the stack: an ORM has nothing to do with CAP.

> You can still do joins, etc. but it's in the context of things like map-reduce, and it makes sure that you can scale despite the joins.

Either of your examples are commonly implemented in SQL databases, too: this is a routine MySQL optimization to avoid subselects and, amusingly, one which an ORM makes significantly easier to implement:

SELECT * FROM a WHERE x; SELECT * FROM b WHERE pk IN (…list of IDs from first query…);

Again, the SQL vs. NoSQL question is about your data model and access patterns, not whether you use an ORM or magical thinking about CAP. The line between the two has become quite blurry since there are things like MySQL-backed key-value stores or Postgres extensions which allow it to handle document-store workloads without losing performance or giving up the ability to do flexible queries. This isn't a question of religion: it's just looking at your business, assessing how well you know the access patterns (SQL systems are generally more flexible) and performance requirements and picking the best solution. Anyone claiming to have a right answer for everyone is wrong.

> Like it or not, when you scale you will lose one of the CAP, and NoSQL databases do the hard task of delivering an eventually consistent data store to you and letting you express yourself in the RIGHT context, which is not SQL.

You've now gone from wrong to very dangerously wrong: there is no scale which is immune to CAP and NoSQL has no magic for avoiding this. Eventual consistency is only appropriate for some problems and, as above, can be implemented on either system. No matter what storage system you choose you're still going to have to make careful decisions about business priorities and test carefully.

yread · on May 28, 2013

Much better than this question i got asked once:

"Which is faster: select from a table or from a view?"

clubhi · on May 28, 2013

I don't see why that is a bad question. A reasonable answer is selecting from a table. Of course it depends on many factors. I often ask questions like this just to get the candidate to tell me why there is not an absolute answer.

yread · on May 28, 2013

Yeah that's the thing. I was talking about computed columns in tables, materialized (or indexed) views, about measuring stuff by experiment and about measuring stuff that actually matters but the interviewer seemed like he wanted to hear a simple answer

chris_wot · on May 28, 2013

Unless, of course, the view is a materialized view.

jfb · on May 28, 2013

If a candidate can talk intelligently about materialized views, I think we're past the "explain HAVING" stage of the interview.

wambotron · on May 28, 2013

This is the perfect interview test. It's easy enough that a candidate can roll through it relatively quickly, but deep enough to prove they have the experience they claim to have. Kudos on having a smart interview test!

jcampbell1 · on May 28, 2013

It would be really interesting to see solutions for MongoDB. These questions are designed to be easily solvable with SQL, but it would be interesting to see how this can be solved with a totally different technology.

free · on May 28, 2013

They seem to be quite simple. For what profiles are you asking these questions?. I have used very similar questions for dev ops role.

jacques_chester · on May 28, 2013

They're good questions because they go past the two simplest query types:

    select from ... where ...;

and

    select from ... join ... where;

Those are the types 99% of programmers who use SQL for simple CRUD apps know. But they come up short for asking more useful business questions.

Eyeballing the list, it tests subqueries, GROUP BY, HAVING, OUTER JOIN, IN/NOT IN and SUM. Fairly useful primitives for general query writing.

I'd try to add a question that relies on UNION, INTERSECT or EXCEPT.

j-kidd · on May 28, 2013

For interview question, the keyword I like best would be EXISTS. Sadly, many developers write SQL for years without knowing its existence. If I am not mistaken, the propel ORM of symfony doesn't even have native support for it.

As shown by codegeek, the 6 questions here can be answered without needing sub-query. Maybe we can add something like "List employees who are not working alone in their department"

codeka · on May 28, 2013

You'd be surprised how many people claim knowledge of relational databases but who could not answer those questions.

arb99 · on May 28, 2013

Yeah they seem quite obvious really. Serious question: is this really the types of questions asked for a dev job interview? (was still interesting to see.)

>List all departments along with the number of people there (tricky - people often do an "inner join" leaving out empty departments)

inner join seems the non obvious way to do it really IMO.

    select 
    departments.name as "department name",  
    (select sum(salary) from employees where employees.departmentid = departments.departmentid) as "department total salary" 
    from departments

jacques_chester · on May 28, 2013

The empty department remark is the clue. They want all departments; if it's an empty department they still want to see that it's an empty department.

An inner join will hide that row because there's no equality between a set (departments) and an empty set (employees in that dept, of which there are none). A correctly structured outer join will.

jfb · on May 28, 2013

They're looking for an outer join, I think.

pessimizer · on May 28, 2013

Coincidentally, this is also the list of questions that I need to ask anyone who is recommending a particular NoSQL solution.

hgezim · on May 28, 2013

Here's a SQLFiddle to try the questions out :)

http://sqlfiddle.com/#!2/9a84e

_pmf_ · on May 29, 2013

That's perfect!

jrockway · on May 28, 2013

Plural table names? <irritated hipster sigh>

geekymartian · on May 28, 2013

wait, they have somebody doing only SQL ?

brixon · on May 28, 2013

There is someone that works indirectly for me and that is all they do. A lot of data warehouse and DTL stuff, so SQL is their life.

LekkoscPiwa · on May 28, 2013

<Blunt> I'm not a SQL Developer, but anytime I get questions for which Google has answers to be found in 5 minutes or less, I'm quite hesitant to work there.

I respect interviews that go along the lines of: "What would you do if..." after giving a detailed description of their environment. But then again I'm a tools/OSes admin, so maybe it makes more sense for my job description. But anytime guys are too focused on third option you can give to the 'ls' command and don't ask real world questions related to potential or existing issues they have/had, I'm not really interested. I can use google, you know, if that answers your question.

I like questions where the interviewer can see years of experience not the amount of detail memorized from a manual.

I would imagine questions that are good to start with: how you as a developer usually start a design of a database? How do you plan it? I got this question once, it's really good. Can't answer that after reading sql book two days earlier. In contrast to his question. </Blunt>

jacques_chester · on May 28, 2013

These aren't "details memorised from a manual". These are fundamental primitives of SQL queries. You simply cannot express most serious questions without them.

If someone came to me and I asked them to write fizzbuzz, an answer of the form "I would google if-thens and modulo and print statements" would be a pretty obvious no-hire.

LekkoscPiwa · on May 28, 2013

I would imagine questions that are good to start with: how you as a developer usually start a design of a database? How do you plan it? I got this question once, I thought it was really good. On the spot I could tell the engineer who asked me that is experienced. Can't answer that after reading sql book two days earlier. In contrast to the questions from the post.

jacques_chester · on May 28, 2013

In a business setting, the conceptual/relational, logical and physical design of a database happens far less often than querying such databases.

Indeed, one of the reasons normalisation is such a Big Deal to relational bigots like me is that it makes SQL's querying tools much more useful and versatile.

k__ · on May 28, 2013

True story, but such questions are just senseless.

If someone knows about set-theory and the ideas behind query-languages, knowing SQL isn't a deal breaker, it's just a nice to have.

jacques_chester · on May 28, 2013

Folk generally don't study relational algebra, on its own, for fun.

jfb · on May 28, 2013

SQL is one of those technologies that caps how much enjoyment you can derive from understanding the related theory. Scheme, for instance, never stops giving, but you can hit a point with SQL where you start to wonder: where's my Tutorial D?

jacques_chester · on May 28, 2013

And where are my proper temporal primitives?

It sucks, but it's what we have for now.

chris_wot · on May 28, 2013

They really should.

jfb · on May 28, 2013

If someone knows set theory, they're likely not interviewing for a LOB CRUD job banging out Rails models.

pessimizer · on May 28, 2013

What if you're not looking for an experienced data modeller, but someone who can write SQL?

Also: If you can read the SQL book two days earlier and answer these questions in a reasonable amount of time - sounds like an insta-hire to me. There's certainly nothing so hard about SQL that you couldn't do that - but some people who spend a lot of time working with databases seem to find the climb insurmountable.

jacques_chester · on May 28, 2013

When I learnt SQL, the part that took me the longest to square away was the distinction between WHERE and HAVING. It was thoroughly confusing at the time.

Once you grasp that conceptually, SQL generally executes left-to-right, it's easier.

siddboots · on May 31, 2013

I avoid `having` wherever possible these days. There are data analysts in my team that I know would have to think twice about it if they encountered it in my code, and to be honest, it's a bit of an wart in the SQL grammar anyway.

In (almost) all cases, a subquery with an extra `where` clause will have equal performance to a `having` clause in the main query.

See the final example in: http://en.wikipedia.org/wiki/Having_(SQL)

jfb · on May 28, 2013

If my datastore is a SQL database, I don't want you modeling anything if you can't answer these questions.