Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Our SQL interview questions (jitbit.com)
194 points by jitbit on May 28, 2013 | hide | past | favorite | 216 comments


I took a similar test, on-line while being watched. 4 sets of 10 multiple choice questions: SQL, unix commands, vi, & HTML. Make your 10 choices, click submit, get your score. It was kinda silly, but what the heck...

It was ridiculously easy and I got all 40 right without much thinking, as many people here would also, I imagine.

Then I asked, "Why bother with this after reading my resume?"

They answered, "We have to do this. We've interviewed 52 programmers with resumes similar to yours and no one else got them all right. In fact, the highest score before you was 32."

Wow. Is this the state of our industry now?


It's the state of the bottom of our industry. Those 52 programmers aren't a representative set of the general population. Those 52 programmers are the ones that can't catch on for any jobs so they keep applying over and over. It's a sampling bias.

And by the way, this happens for most industries, not just technology. McDonald's has the same problem. Their majority of applicants fail at tasks like having the literacy to fill out an application form or getting out of bed and showing up to work. This doesn't mean the majority of the population is that deficient, just the majority of deadweight floating around the would-be job pool.

More generally stated, the worse an applicant is for his desired job, the more times his incompetence will attend interviews to be seen. Quality performers in any business get hired quickly and don't stay in the interviewing pool. So equivalently, any interview pool will consist mostly of bad candidates.

We need a name for this effect so that we can just quote it whenever this topic comes up, like Dunning-Kruger. Anyone got a good suggestion? Joel Spolsky was the first to set it out well and become widely read[1] , but Spolsky's Law is already used for the Law of Leaky Abstractions.

[1] http://www.joelonsoftware.com/items/2005/01/27.html


After looking for a new job recently, I realized the converse is also true. The worse a company is as a place to work the more often they need to hire. Therefore the pool of available jobs mostly consists of jobs no one wants. Be picky in who you hire and be picky in where you apply.


> We need a name for this effect so that we can just quote it whenever this topic comes up, like Dunning-Kruger.

Sounds a lot like the Market for Lemons:

http://en.wikipedia.org/wiki/The_Market_for_Lemons


Took a similar test at a recruiting company in ... 2006 (IIRC). Mostly on PHP. And the test was wrong. IIRC, I got 24 out of 25. They were ecstatic - "wow, no one in this office ever got such a high score - you were almost perfect"

"I was," I said. "One of those questions is wrong".

"Oh, no, it couldn't be - we have a team of experts who create these to the highest standards, blah blah blah.".

I asked again to go back, take a picture (crappy camera phone) and reviewed it again when I got home. It was wrong. Something to do with how references behaved in PHP5, and the exam's answer was right for PHP4, but the test was for PHP5.

Anyway, it was a bit depressing, and I don't want to have to go through those sorts of tests again if I don't have to.


"One of your questions is wrong" is an excellent filter. People should be thrilled to get that information.

I corrected two questions on a multiple-choice test that a finance firm gave me. They flushed me out after I handed in my personality test, so maybe it was for the best.


Yeah, you'd think it would be, but it's also indicative of someone who might be 'too big for their britches', or some other such nonesense. Given how many fakers I've encountered doing crap work for people (not just in software, but any service), I'm sadly inclined to realize that when I say "one of your questions is wrong", I'm very likely to be dismissed.


In the early 2003 time frame, I ended up taking a bunch of multiple choice tests about Java, C and C++. Every single one of them had things wrong. As near as I can tell, the auto-capitalization feature of Microsoft Word caused most of these problems. For example 'boolean' and 'Boolean' denote very different types in Java. In one test, they meant 'boolean' but had 'Boolean' written out.


I would definitely be thrilled. I wonder if intentionally putting an incorrect assumption in order to see if someone brings it up would be a good indicator. The idea wouldn't be to penalize someone who didn't notice the assumption, but to reward anyone who noticed the mistake and brought it up.


> Then I asked, "Why bother with this after reading my resume?"

You're aware that not everyone tells the truth on their resume right?

> Is this the state of our industry now?

Now? It's always been this bad... Happens when demand for talent out strips supply - the bar keeps falling.


"Wow. Is this the state of our industry now?"

Traditionally this happens when HR demands "10 years experience with windows server 2007" and you somehow snuck thru (or maybe worked at MS on the dev team, or were in a beta program or...) anyway by definition the only people making it thru the filter are going to have a weird/cool/interesting background or are going to be pathological liars who inevitably will only get 32 on the 40 point test, because they filtered all the real applicants out by applying a bad filter.

I guess the SQL analogy would be something like:

SELECT COUNT(*)

FROM applicant

WHERE applicant.skill > 40

HAVING applicant.skill < 10 AND applicant.honesty < 0


Without seeing any real data, wouldn't that all be a WHERE clause? I wouldn't imagine the skill was an aggregate of the applicants.

I've just skimmed over this though, I'm not having a go


Ah I was trying to imply two sets of filtering and probably have them reversed, where HR is filtering on lack of honesty which only implies lack of skill or at least average skill, whereas the secondary filtration (the having clause) would imply the test for skill.


Don't you need a group by clause in order to have a having clause?


This is yet another of those things that gets the pgsql folks all wound up about mysql. mysql is generally permissive best effort rather than restrictive follow the spec, so HAVING is allowed to reference stuff not in a GROUP BY or an aggregate (like MAX or COUNT). As you imply, this is not allowed by the SQL standard so philosophically I would Strongly Expect pgsql to error out unlike mysql. I donno what ms-sql does, don't use microsoft stuff. Oracle costs too much, so another I donno.

About 99% of the noise about mysql vs pgsql boils down to this overarching philosophical different of "try yer best" vs "only perfection is permissible". There are minor other differences aside from that, none of which I can remember at this time.

I was mostly trying to make a joke and making the psuedocode kinda sql inspired rather than cut and paste into a window like a stack exchange answer. I could have implemented it "properly" as a nested subquery I suppose. Or to make the point a little more .. obviously, just "select 0;"


That's messed up. "Having" should only be used for filters on the output aggregate functions, and "where" should be used for filters on the input row data. If mysql lets you use "having" when you mean "where", that is unfortunate.

example:

    select count(1) cnt, department
    from sales
    where department_id in (1, 2, 3, 4, 5)
    group by department
    having count(1) >= 100;
So, it filters out all the input rows to only those department ids, and then it filters out the aggregate output rows to only those with a count() of 100 or more.

This is how Oracle and MS-SQL server work.


In SQL Server a HAVING without GROUP BY is a way to filter out duplicates.


It seems to error out.

  select name, count(*)
  from queries
  having count(*) > 1
Column 'queries.name' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.


Like this. There is more than one way to do it:

  -- List employees who have the biggest salary
  -- in their departments
  select
      Name
  from
      Employees e1
  where
      exists
      (
          select
              1
          from
              Employees e2
          where
              e2.DepartmentID = e1.DepartmentID
          having
              max(e2.Salary) = e1.Salary
      )


It's been like that since at least the late 90s when the spike in wages versus most other jobs both increased the financial benefits from overstating your experience and decreased the cost of having to find a new job if you were actually caught, which is surprisingly uncommon.

One major factor: companies delegate a great deal of hiring to recruiters, in part because it's expensive to maintain in-house experts. Since recruiters are usually paid by the hire rather than for sending qualified applicants there's pressure to simply use the shotgun approach of sending as many remotely plausible applicants over and hoping one of them will be hired rather than spending the non-trivial amount of time needed to find a great fit. Sometimes resumes are even altered by the recruiter to add required skills – and that probably does work well at large organizations which either don't hire well or where HR tosses every word they've used in past listings into the job description.


I was once tasked with hiring some developers. HR used filtering software, so every candidate had C++ on their resume even though we didn't need that. Out of curiosity I asked each candidate a simple C++ question. Not one had a clue.


You've heard of the FizzBuzz test, right?


I heard about it in the last year or so here on HN, otherwise I wouldn't have known what a FizzBuzz was. I've been coding for 20 years professionally and 30 for fun. I've never coded a Fibonacci either. I think colleges need to teach how to code a microcontroller to do something, build a multi-platform application, build a database application, set up a CI server, etc.


We use FizzBuzz as a very basic first filter in our hiring process. It saves us a lot of interview time considering some 60% of candidates fail (or give ridiculously over-complicated implementations). Resumes are nothing but an exercise in creative writing, it seems.


Fibonacci? That's a recent invention, right?


not sure what you mean?

coding a simple function for evaluating the fibonacci sequence is a reasonable alternative to FizzBuzz that allows for some slightly more sophisticated requirements: like "code a recursive function that evaluates the first N members of the fib sequence. use memoization in your implementation and show how this runs in O(n) time complexity."


I think he was joking based on the fact that the Fibonacci number is at least an eight hundred year old concept.


Maybe the previous person was an emacs guy? Seriously, what would be a valid reason to sort employees based on which specific editor they are habitually using?


> Seriously, what would be a valid reason to sort employees based on which specific editor they are habitually using?

It's not for filtering out really good candidates, it's for filtering out abysmally bad candidates. Show me the dumbest Vi user and the dumbest Eclipse user you can find and you might learn something.


And people who've never used vi are ipso facto abysmally bad?


Ignoring ed, vi is the only sane editor that you can depend on to be installed on every system you work with. Even if you know emacs, it's common to at least be familior with enough vi basics to fix a broken system (hjkl, i, :q, :w)


The original interview was claimed to be for a developer position. If you're a developer in most companies, you never have to handle a random broken system, as usually any and all deployment or production maintenance is separated. DevOps is still an exception, not the rule.

You manage your workstation and your dev/test environments or VMs at most - they have the exact editor setup you like. The only interaction between your computers and "foreign" systems is the code version control system. Your editor, no matter how rare or exotic, is guaranteed to be installed on every system you work with - if you work on your systems, not manage systems of other people.

Even the OS doesn't need to match. You can easily code for Linux deployments on a MacOS or Win machine, and never touch any Linux computers. Heck, you even can code for Windows deployments on Linux machines, though sometimes testing that may be a mess and requires a VM - but you certainly can do that.


Unless you have the misfortune of having to work with windows systems :)


> And people who've never used vi are ipso facto abysmally bad?

A few false negatives might be acceptable.


Jeff Atwood has written about a similar experience in "Why Can't Programmers.. Program?": http://www.codinghorror.com/blog/2007/02/why-cant-programmer...


We also set up an online test to screen candidates. Because HR and the recruiters kept sending us such terrible candidates.

I just can't understand how people don't score 100%. Alas, not many do.

It's saved us devs a HUGE amount of time.


I dislike the whole interview process, I'm only young so the interviewing process is still very new to me. We find that people lie heavily on CVs to the point where it is annoying. We had someone once who had 5 years CSS experience yet found it hard to center some text.

I think an online test would help our cause some but I don't know how simple to make some of the questions. How long would you expect the candidate to be taking the test? Do you tailor the test to the candidates?


It's dunning-kruger and lying in full swing, I feel.

Some people go like "Oh I worked 5 years in that tool, I must be magnificient in this tool. I shall list myself as highly professional in this area!" and do so.

Me... I rather go like "Hmm... Well I worked with C for 3-4 years, but I'm rusty as hell... and I didn't touch many libraries much, I just worked on tiny microcontrollers and implemented a highly multithreaded programming language execution environment. Guess I'm gonna call me <somewhat experienced> there." Though afterwards I just roll over people in an interview.


The positions on my team usually call for people that have a strong mix of business and technical skills. We get business analysts with programming that they've picked up along the way or programmers with MBAs. Typically, when the programmers get dropped from the process, it is when we test their SQL. Sad, but yes, this is the state of the industry.


And yet people still ask if automated testing is needed. I heard there's a new research project being funded called "People: They make mistakes. Again. And Again."


"List employees (names) who have a bigger salary than their boss"

    SELECT e1.Name FROM Employees e1 LEFT OUTER JOIN     Employees e2
    ON (e1.BossID = e2.EmployeeID)
    WHERE e1.Salary > e2.Salary

"List departments that have less than 3 people in it"

    SELECT d.Name, COUNT(e.EmployeeID) FROM Department d LEFT   OUTER JOIN Employees e
    ON (d.DepartmentID = e.DepartmentID)
    GROUP BY d.Name HAVING COUNT(e.EmployeeID) < 3
"List all departments along with the total salary there"

    SELECT d.Name, SUM(e.Salary) FROM Department d INNER JOIN Employees e
    ON (d.DepartmentID = e.DepartmentID)
    GROUP BY d.Name

"List employees that don't have a boss in the same department"

    SELECT e1.Name FROM Employees e1 LEFT OUTER JOIN        Employees e2
    ON (e1.BossID = e2.EmployeeID)
    WHERE e1.DepartmentID <> e2.DepartmentID

"List all departments along with the number of people there"

    SELECT d.Name, COUNT(e.EmployeeID) FROM Department d 
    LEFT OUTER JOIN Employees e
    ON (d.DepartmentID = e.DepartmentID)
    GROUP BY d.Name


Curious question: why do you always use table aliases? To keep your query shorter? When I don't need an alias I just use the full table name for readability:

  SELECT
    Department.Name,
    COUNT(Employees.EmployeeID)
  FROM Department
  JOIN Employees
    ON Employees.DepartmentID = Department.DepartmentID
  GROUP BY Department.Name
  HAVING COUNT(Employees.EmployeeID) < 3


Good question. I always find it easier to have the aliases because sometimes, table names are too long for me to remember. Also, tehre are times when we join the same table by itself and at that point, I use x1, x2 etc. In general, aliases always work while direct table names may not work for all cases. So i just keep it simple.


There's also the cognitive overload of reading long identifiers, multiplied by the naming conventions of some large corporate databases.

I'd rather see:

    EmployeeReferences eFrom JOIN EmployeeReferrals eTo
...and then see

    ON eFrom.ID = eTo.ID
    ...
    JOIN xyz
    ON eFrom.Source = ...
rather than have to read acres of EmployeeRe-something 4 or 5 times through an 8-table BI join.

Similarly, I've had to deal with (admittedly legacy) tablenames like A12R18SALE and A12B14PROD. Aliases come in really handy there.


I always use table aliases as a matter of habit:

* I always specify columns as table.column, not just column as it makes things explicit where the could be ambiguity if a less experienced coder is looking (I know that column reference in a correlated sub-query refers to the most local instance of that table, but having the table name there explicitly states that referring to that was my intention and not an accident). Having short aliases saves typing in this instance (though not too short/arbitrary - the object names should still be meaningful in the context of the query: a, b, c, d, ... would generally be bad aliases)

* If the query gets more complex and needs to join objects in that have columns of the same names as those in existing objects (especially if you add another reference to an object already in use in this query), you've already got the aliases there for the first instance reducing the chance you'll get one wrong when adding them in for both instances of the same name.


I use to use full table names if only to avoid inconsistency with aliases in other code. Now I use aliases and ignore inconsistencies. Being a purist wasn't worth it.


Too much typing.


Get a better IDE ;)


You assume that all people have bosses. The top boss has no boss. You also assume that all department names are unique.

Nitpicking, yes, but these questions certainly allow for a deeper discussion with the interviewer.


You assume that all people have bosses. The top boss has no boss.

Isn't that why a left outer join is used?

Besides, if you are referring to question #1, a person without a boss can never be a part of the set of employees who have a higher salary than their boss.


Two assumptions to be clarified in the interview: 4th question above doesn't apply to the top boss (who can't have a greater salary than a boss he/she doesn't have, so no issue on the 1st question). Department table has an alternate key on department name.


> people often do an "inner join" leaving out empty departments

Empty departments have less than 3 people


correct. Edited.


I was following along (without peeking ahead) and I briefly thought "What about NULLs and empty joins?" But I figured, it is an idealized test. For example, what happens when a boss has a NULL department id? Would it be safe to say that they are in a different department than their underling? SQL says no.

Besides that, I think this is a great test. Personally, I start off a bit slower so I don't embarrass people that don't know SQL.


My own answers, with test data: https://gist.github.com/Pluies/5663135

Despite thinking I knew SQL reasonably well, I wouldn't have fared very well at all in an interview setting. :/ Took more time and googling than expected.


didnt google but I had to create some example data to think about the solutions. So yes, i would not be able to do it in 5 min.


I'd give someone props for getting these questions all correct in half an hour, even if some could do it sooner.


Probably not the best idea to group on the Departments.Name instead of Departments.DepartmentId ..


On the first and fourth queries you don't need LEFT JOIN because the WHERE clause guarantees that only INNER JOIN rows will match. (Presumably neither question applies to the top person.)

On the third one you don't list all departments; the empty ones are filtered out. Needs a LEFT JOIN.


The question List all departments along with the number of people there has an answer using a correlated subquery, rather than a join.

I have a relevant story about that.

About 9 years ago now, another developer escalated a bug to me. Every time they ran a complicated auto-generated query, they got logged out of Oracle. No way! I tried it. Happened to me. Began trying to narrow it down. Ran out of connections. Got a DBA to unwedge the machine. Began again. Ran out of connections again. Got the same DBA to unwedge the machine. Received a lecture about not opening so many connections, replied that I was tracking down a bug and had no choice. Showed him the bug. He was astounded.

Not long after I finished tracking it down and sent them the fix. Showed it to the DBA who verified that it had been reported already, and was fixed in the next release.

The bug was that any time there was a correlated subquery with no records, Oracle logged you out. My guess is that something, somewhere, followed a null pointer. The obvious solution was to move to a left join. If I remember correctly, the way it was being autogenerated made that hard. My solution was to have a correlated subquery which was a left join on DUAL so that there was always a record.


Relevant story? You're hired!


These questions are very simple, though I guess they cover a few of the core concepts. Basic selects, joins, joining the same table twice, left joins and group by.

I'm most worried by the comment "(tricky - people often do an "inner join" leaving out empty departments)". That's a basic question and if that's considered "tricky" you've got a real problem on your hands.

Maybe if you're hiring for a junior position you could excuse someone not knowing about left joins. If it were for a position that had any sort of focus on db work I would pass on the candidate (caveat, when hiring juniors I look for desire to learn above most everything else).

Obviously I'm getting old. "Back in my day" a basic understanding of SQL was just part of the job. Didn't matter what you worked on - you should be able to work with relational database. I'm concerned that the attitude of "I don't need to know that - my ORM does that for me" has become the default outlook. Over the last few years I've had to convince developers several times that the complex aggregation they're writing in their script would be easiest solved by using SQL. Unfortunately, increasingly it seems that newer developers aren't even aware that these tools are available - or how to use them.

If nothing else, relational algebra is a wonderful and elegant subject that is worth learning.

Darn kids, get off my lawn! :)


> That's a basic question and if that's considered "tricky" you've got a real problem on your hands.

If they aren't warned then it's reasonable to assume that every department has employees. Otherwise why would it exist?


Fair point. Experience has taught me to internally question whether or not each relationship should use a left join or not. The fact that it says "List all departments" made me think it should be a left join. In fact, to me that screams "left join".

Really, in a philosophical sort of way, it's the use of the query that determines which join to use. They want ALL departments, you give them ALL departments. Using a left join you protect yourself in the future - with a performance hit. As you say, if there was a guarantee that the data didn't contain empty departments you'd use an inner join.

Most importantly, a reasonable developer should know to ask the question - if not of the examiner, at least to themselves. Just making an assumption is not the right approach.


It is a good way to expose real world experience.

Client requirements are usually vague enough (heck, internal requirements are sometimes vague enough) for there to be problems like this discovered later. or a given business it might be valid for there to be departments that are empty at a particular time.

Someone with extensive real world experience will know to never assume any detail not included in the specification no matter how much of a non-brainer that assumption might seem at the time. They will either ask if empty sets need to be considered, or they will preface/suffix their answer with "assuming there are no empty departments or you don't want to report on them if there are" or "assuming there might be empty departments and you want them included in the report" - either way they are showing an ability to parse requirements and identify possible ambiguity that should to be queried.

You've got a multi-level differentiator there:

* The bad candidates will not be able to give a working answer

* The fine candidates will give a working answer though might miss the exact requirement (as it isn't properly stated and they just assume)

* The best candidates will spot the deficiency in the spec


I can think of two business reasons straight off: 1) a department changed its name, or 2) the department used to have employees but is now defunct, such as the Typing Pool department.

My point is, to try to be aware of the assumptions that you are otherwise unconsciously putting in to your program's business logic.


I'm an aspiring junior developer, and I found Stanford's (online) "Introduction to Databases" course to be excellent. I particularly enjoyed the relational algebra section where we wrote relational algebra queries by hand.

From the newbie's perspective, sometimes it can feel as though beneficial resources on the basics of subjects like relational algebra, SQL, and relational databases are somewhat sparse.

Unfortunately, I did not have enough time to finish the class completely (it required, as it should, a significant amount of time dedicated each week), but I'll definitely be re-enrolling next semester as long as it is offered online again.


I feel exactly the same way. I'm the first one to admit that an ORM can save you a huge amount of time for basic database stuff. However, as queries get more complex I always write them in raw SQL. It's just faster for me. I'm shocked that someone who applies for a programming job wouldn't get these questions right. They're not exactly rocket science, right?


I have just been on the job market looking for a senior PHP position. There were so many companies that requested tests from me. Either in the form of online tests, or tasks which I had to complete and return. While I do understand the need for them (having had to hire other developers), some of the requests were quite outrageous.

1 particular company basically wanted an entire application to be developed in an evening, and I was giving strict instructions to focus on security and not allowed to use external libraries. After submitting this elaborate task, I was still criticized on using PDO (which is standard with PHP...).

IMHO, sometimes the lengths employees go through to find a developer are so ridiculous that they actually drive away people.


I recently took an SQL skill assessment test from one of the big 'testing' sites. My first problem with the test was that it was a mix of Oracle and MS SQL, when my resume said 'MySQL'. And there were questions such as 'What is the MS SQL equivalent to the Oracle keyword xxxx?' Luckily I've used it enough to not bomb that portion. To be expected with a recruiter...

Anyway, some of the other questions were pretty silly like "Which of the following is a DDL command?", and many were SELECT statements with a syntax error that you had to pick out, and probably the one question that made sense was about the difference between WHERE and HAVING.


If their recruiting is so incompetent, maybe the company is clueless otherwise as well?


This is a great way to start... A defined schema and very clear and simple requirements, I could dash these off really quickly, and then we could discuss optimization. So much better than asking me to define a join. I miss SQL. http://stackoverflow.com/search?tab=votes&q=user%3a35398...


Seems to me to be a solid test, though I might include a small amount of sample data to nudge them in the direction of some of the potential issues (empty departments and so on) - maybe I'm just kind like that. ;-)

In interviews I've always been amazed how few people who claim to know SQL can use GROUP BY, HAVING and aggregate functions (or depending on the question self joins or sub queries that will allow them to achieve some of the same things).

My normal question is to present them a table with a level of duplication ask them to write something in SQL that identifies the duplications and something that removes them which covers much of the same syntax.


Removing duplicates is dependent on the underlying database because the behavior of deleting while selecting varies. I know MySQL doesn't allow it in most cases, so you need a temporary table. Identifying them is a fine question.


You can get a list of dupes to eat without a temp table with something like:

select min(id), count(*) as dupecount from yer_table group by some_hash_identifier_or_whatever having dupecount > 1

And then just iterate thru delete from yer_table where the id = the min(id) as fetched above.

Or maybe your business logic is to keep the oldest record and zap the newest. Or based on some column data rather than simply age.

Now the really interesting discussion is how often this happens (like once-off, or every 10 seconds, or), and how scalable you need it to be. Are you talking about 100 records or 100e6 records. Also literal duplicates as in "two" works pretty well but not so good if there's 50 duplicates and you need to delete 49 of them. Of course for 49 duplicates you could select the identifier hash and the lone lowest ID and delete all entries with the same identifier hash where the id isn't the lowest id for that hash...


I give a basic single table schema with sample data so they have all the information they need to work with.


If you can only answer these using an ORM, then you really shouldn't put knowledge of SQL on your resume.

I am not really into giving programming tests to interviewees, but if you claim to have experience with something you should be able to answer simple questions about it.


I wouldn't treat this sort of thing as dispositive, and if I were doing hard-core SQL development, I'd dismiss it entirely and start the interview with much hairier wizardry; but for a generic, gonna write some queries but mostly live outside the database kind of role, the five minutes or so this sort of test takes at the beginning of the interview gives me a strong indicator of how to assess what the candidate actually knows, rather than what is represented on their resume. It is a guide for the actual meat of the interview.


These are nice, but you need simpler questions.

Hear me out: Take away two questions (the first four are enough anyway) and add two to start out with that are much simpler. You will be shocked how many people will fall out at that level.

Years ago, when I first started hiring, a friend told me about this and I didn't believe him, but I tried it anyway. I was astounded how many people were completely bluffing. It helps expedite the whole process.


This comment was very good:

"This is exactly why reliance upon ORMs has had a huge negative impact on engineering. Most of these are easily solved with Group By, Having, and/or other aggregate functions, but the ORMs have created this veil of complexity."

If ORMs really simplified the underlying complexity, so I didn't have to think about it, then ORMs might be worth it, but I have never worked on a large project where, at some point, I was wholly free of the underlying technology. If its a project that I work on for a year or more, there is always some moment when I need to drop down to SQL.


This is much more useful than the typical question I've received at some companies - "On a scale of 1 to 10, how would you rate your SQL knowledge?"


I've had this and have answered something like:

"Compared to most other devs I work with, 7 or sometimes 8. Compared to people who do SQL for a living, possibly a 4, or a 5 if I'm feeling cocky. It all depends on what the '10' really represents - the best of the best, or the best of the people I'd be working with."

Well, something like that. That last bit - I've said it more tactfully in the past.


This is the only way to answer this question.


Thanks :)

Said wrongly, it implies that the company has crap people working at it, and that's generally not a way to get hired. But most companies know they're not getting the 'best of the best' when they hire - they're getting the best they can afford in their geographic area during a certain time frame.


When I was interviewing, I was told to think carefully about my answers to the "rate yourself 1-10 on your Python/Java knowledge" pre-screen questions.

If you answered 10, you just might find Guido or Josh Bloch on your interview panel.


This seems like a great sign of the overall condition of the company, or at least their hiring practices.

Has there ever been a great company or great interviewer who would seriously ask this question?


Everybody answers 8.


Somewhat surprisingly most web developers I know know very little SQL, having picked it up exclusively by tinkering to get things done any way whatsoever, even if clumsy or slow. In fact SQL might look deceptively simple at times, at one point I read an ANSI SQL book so I already had some formal education in SQL when I started doing webdev, but I only really learnt ANSI SQL at the university in the databases course, and then I still had to do more learning about many details of my DB server of choice (postgres), including things like spatial queries and indexes, full-text search etc., you can get huge speed ups and infrastructure simplifications by putting those kinds of things directly in the DB.

Ask people about difference between LEFT JOIN and RIGHT JOIN, or using the schema from the article, to select all attributes of employees with the highest salary in their department in pure SQL and you will see how much or how little people know, in fact many webdevs don't even understand JOINs at all!


Which book would you recommend for learning this stuff? I'm not very interested in 800-p. gorillas; there surely must be something short, not too theoretical, and to the point.


I enjoyed SQL Antipatterns by Bill Karwin. Very easy to read and offers some practical approaches to common issues.


Bill Karwin's "SQL Anti-Patterns Strike Back" presentation on Slideshare.com presentation is worth checking out -- it's 250 slides long, covers 4 kinds of anti-patterns (in queries, DB creation and the design of both DBs and applications). And he goes through actual code examples.

Check it out here:

http://www.slideshare.net/billkarwin/sql-antipatterns-strike...


A little theory goes a long way in SQL.

Pick books focusing to SQL, not on databases overall.

Most database textbooks (Date, Ramakrishnan & Gehrke) cover a lot of ground, rather than just the language.


I'd recommend Head First SQL (http://www.headfirstlabs.com/books/hfsql/). I started on that book with no real coding knowledge whatsoever (except a long-forgotten Java class that I took circa 1996). Plugging away on that book a few hours a week for a few months taught me enough SQL to get my foot in the door to a new career.


Ben Forta's Teach Yourself SQL in 10 minutes http://www.forta.com/books/0672336073/


My first weed out question is asking them to describe a Left Outer Join. They don't have to get it exactly right, I just want to see if they ever did anything more than a two table inner join.

For a Web Developer the first weed out question is to tell me the difference between a GET and an POST. Here all I really want then to know is that a GET is what generally see in the URL and a POST is commonly what you see in HTML Forms. I want to see if all they ever did was ASP.NET WYSIWYG web development or if they actually know something about the internet.

These two questions can be done in a phone screen. The faster that you can weed out people the cheaper the hiring process is.


Careful using that terminology. You'll get false negatives on people who still think in terms of *= syntax, and will tell you that the words "left", "outer", and "join" are all redundant and probably don't belong in SQL in the first place.

I used to be one of those guys, but I'm much less grumpy about it these days so I'd still pass your test. I have a sad suspicion that I'm on the progressive end of the spectrum when it comes to guys who deeply understand SQL.


Yep, I've done a lot of PL/SQL programming, and we've always used (+)= syntax and joins with all the tables after commas and all the joining conditions after WHERE.

Now I work on different project and we use join syntax, but I could easily imagine people that do joins all day, and not know JOIN .. ON .. syntax.


I came to Oracle after it adopted the ANSI syntax, so that's what I use. So my experience is the opposite of yours -- when I see the (+) I need to look up the syntax to remember if it's left or right outer.


And then I ask, hey, where's my BOOLEAN? And then I drink.


Oh god.

And the lack of a serial/autoincrement/identity type.

So. Many. Effing. Triggers.

And 32-character identifiers.

sigh


It's so much better to use a database that only allows one autoincrementing value per record, or one TIMESTAMP and then only allows you to have either a create timestamp or an update timestamp without writing a trigger.

I prefer the way Oracle does it, you may have to do more work but it's more explicit and flexible that way.


I don't. I prefer for the common case to be correctly and automatically handled for me.


True, I use it for a conversation starter. You can tell the split second after you ask the question if you will get any kind of reasonable answer.

I personally have used the *= syntax more than the LEFT syntax, but that concept has helped filter people since I am not allowed to "test" people.


Obviously a GET has variables and a POST doesn't (that's why they are called "GET variables")

[I've encountered this belief more than once working with PHP developers. I would hope that the answer was closer to something demonstrating knowledge of HTTP as a protocol.]


Good point - that they don't have to get it exactly right. For me, it would be enough if the candidate is just aware that there are different types of Joins for different scenarios.


Interesting, so I think I'll pass your first screening even though I'm not a (web) developer but (most of all) a system/network engineer :D


If the position you're filling is directly dependent on more-than-average SQL experience - creating a DB driver, an ORM, for ex. - then SQL-specific questions are applicable. But, by and large, this type of specific-knowledge testing is not very useful. I want to see a developer's general abilities at problem solving and the source code to back it up. If you have solved complex problems in C# - and can prove it - then you certainly as hell can solve complex problems in Go despite not having any experience there yet. Sure, if I'm trying to fill a Go position and someone has proof they're an excellent developer _and_ it's in Go then they'll get top consideration.

Being able to write SQL queries from memory has little correlation to a candidate's level of ability. Personally I consider myself a fairly strong developer and it hasn't been only until the last year that I can now write pretty complex joins from memory. And I've been developing for 20 years. Only because of a recent project and the volume of queries I had to write did my method change from using a graphical query writer to simply memorizing the syntax I need. Indeed, this very type of adaptation is something I look for in candidates.


If you can't handle these sorts of queries in a forgiving interview format, then by definition you are not a strong developer in SQL. That is not to say that you are not a strong developer in general; or that you couldn't handle a job where you had to interact with a SQL datastore; merely that the interviewer is not going to be able to talk SQL with you.


Since SQL is a declarative language, you have to use it very differently than procedural or functional languages. I've met many people who don't understand the difference and just write procedural code in their SQL queries (cursors, etc). The output may be correct, but performance on anything bigger than a toy dataset is terrible.

If your team builds massively parallel systems in Erlang, you need to make sure a candidate understands at least the basics of its process model and message passing. If your team build high-performance web apps, you need to make sure a candidate understands at least the basics of HTTP and the difference between client and server. For SQL, the same is true: they need to understand at least the basics of the relational model and declarative programming.


These are very simple, though. If you say that you know SQL (or have known SQL, but are a bit rusty) and you can't at least make a strong showing on these questions, you are lying.

This is not complex problem solving, this is problem solving. I'm not sure what memorization has to do with any of this. SQL is a language with only maybe 10-15 important words. It's not like trying to get around Paris without a phrasebook as a non-French speaker.

I've found that using graphical query writers (I only know of the one in access) and ORMs are generally harder than writing the SQL. ORMs are good for keeping code db agnostic, though.


It's called Query by Example and there are quite a few DB management GUIs that have it. I found it pretty intuitive for setting up joins by right-clicking join lines and setting the particular properties of that join. Of course now that I have committed all types of joins to memory it's far faster to write them simply by hand. But my lacking that memorization 10 months ago wasn't really insite into my ability (or lack thereof) as a developer.


Even the worst developers I have worked with could figure out SQL queries like this in an afternoon. If they study before the test, congratulations, we just hired a dud.


You made my point much more succinctly than I. Mainly, this type of candidate testing tells you very little (positively or negatively) whether they're strong developers. Consider an OS developer - the person may have never EVER been exposed to SQL beyond "select * from table" but is probably superior to any candidate that may ace this little SQL test. Mrs. OS developer can learn any amount of SQL syntax (and deep understanding of it as @joshyeager points out) within weeks of being hired.


I think you are assuming that the company is using SQL interview questions to test whether or not the candidate is a strong overall developer than a strong SQL developer. It seems to me that they are testing for a very specific skill and area.


I like the little schema, it flows right into more advanced discussion about how you'd deploy indexes based on the design and queries, how you'd expand the schema in normalized form into supporting an office building seating assignment for each employee, or even multi-sites for employees using a many-many table.

One thing I didn't get was one comment on the article that a guy could struggle thru this with phpmyadmin but not at the console. Maybe he was kidding or trolling. I recently install phpmyadmin to fool around and I can't imagine talking about using it, you'd have to click like fifty thousand times just to implement just a simple query and it would probably take 15 minutes, yet not reduce the cognitive load at all. How do you talk about GUIs in an interview? "Click on the icon of the fornicating centipedes, then on the cthulhu icon, then in the ribbon select the turtle crossing street sign" It makes talking about regex's seem humane in comparison.


I've switched to Chive DB (chive-project.com) precisely to get away from the inanities of the PHPMyAdmin interface, it's much easier to just enter the SQL. (Working on a Chromebook so not using a native application for this.)


There is a very simple way of testing SQL knowledge. And you don't need any of this online tests or white board programming stuff.

Build your self a small sqlite database. Nothing much, but sufficient enough to test the candidates ability write queries. Give him a manual. No internet connection and now give him problems(a few select queries, joins, inserts and may be a few tests here and there to test how good the guy is in schema design). If the guy can write queries after reading the documentation, then hire him.

If he can't write queries, I mean practically on the computer and show you results he is not of much use. Even if he can answer all your white board answers.

This is applicable to any programming interview. If a person can read documentation well and find his way to write a program to solve a problem such a person makes a good hire.


I could nail those with the Django ORM, but I'd struggle to write syntactically correct SQL, not having done it in a while. But it says that your test machine has MS-SQL; with the machine in front of me, I could probably puzzle it out the join quirks with a couple minutes of trial and error.


So the question is, do you represent yourself as a SQL developer? If not, no drama.


I think if you don't give someone a machine you should be pretty forgiving about exact syntax.

Someone who can put together a statement that's broadly right with small errors normally means someone who is rusty (or nervous) but knows their stuff rather than someone who is guessing and a couple of follow up questions will usually confirm that.

General rule for me: don't ask anything that a decent IDE or 10 seconds with a manual / help file / Google would have prevented (unless you've given them a decent ID or similar in which case it's fair game).


How would you solve the first one using Django ORM?


Employee.objects.filter(boss__salary__lte=F('salary'))

Find me employee objects which have a boss salary less than or equal to the salary.


Out of curiosity, would the ORM map to the same SQL query? Or would it request all employee-boss pairs and filter them outside of the DB?

There's a huge performance difference involved.


Potentially huge performance difference.

For the 99% use case, the performance hit fo the ORM is not significant enough to matter. Most projects have many tables, but only one table that actually needs to have any speed optimizations. That one table can go in NoSQL and the rest can be handled by a ORM.


this is a good question. my experience with the django ORM is that in general it resolves any call to an ORM method into SQL and nothing else, but it is not guaranteed to generate the most optimal SQL.

It is however, a somewhat common practice of django devs to to do some post-processing on a queryset in Python. totally acceptable for small querysets with complicated logic, but, yeah, obviously unacceptable for large performance critical queries.


It will only be one query. The django orm won't do any filtering outside the DB that is up to you.


I believe Django would do this as a single query.


It does not apply to the first question, but how does the ORM handle joined columns from other tables? Is it embedded somewhere in the returned object? The best way I could think to handle this using an ORM would be for the object to contain a set of arbitrary key/value pairs to contain joined columns but it seems like a hack.


It is fetched dynamically e.g. employee.boss.department. That will usually cause two more queries to the database. You can use select_related and prefetch_related so that these objects will be loaded into the ORM in one or two queries.


  from .models import Employee
  from django.db.models import F

  print Employee.objects.filter(
      salary__gt=F('boss__salary')
  )


    Employee.objects.filter(boss__salary__lte=F('salary')).values_list('names', flat=True)


Nice set of questions.

I would also use some sample real world data to check if my queries scale well instead of inserting 10 rows and running all queries on it.

http://stackoverflow.com/questions/57068/good-databases-with...


What is the preferred way to aggregate with nulls?

    SELECT  Departments.name, SUM(COALESCE(salary,0))
    FROM Departments 
    LEFT JOIN employees USING departmentID
    GROUP BY 1
The above is how I would solve the last one, but I often feel like I abuse COALESCE.


Using COALESCE here is correct, but sum() returns NULL if there were no non-NULL inputs, so COALESCE should go around it:

COALESCE(sum(salary), 0)


Aggregates generally do the right thing with null without the coalesce.


Thanks. It appears I need to stop overusing coalesce. I was told that sql NULL means "A value that is not yet known", which nicely explains why 1+NULL, 1 < NULL, 1 > NULL, 1 = NULL is always NULL. Now I know that AVG(test_scores) produces the average of the known values automatically.

- - -

I just did a test, and it appears the COALESCE is needed in this case. Running an aggregate where all values are null, results in NULL (the empty department). You need to do something because the total salary of an empty department is known to be zero.


> Aggregates generally do the right thing with null without the coalesce.

Aggregates generally do the most-likely-to-be-right thing with NULL values if there is at least one non-null input to the aggregate. The thing is, if you depend on this, you'll run into real data situations where all the inputs are NULL, the result is NULL, and that's not what you expected.

If you are aggregating over an expression that can be NULL, and you always want a non-NULL answer, you probably need to use coalesce or something similar so that you don't have non-NULL inputs to the aggregate.


A sample SQL file with dummy data would be much appreciated, I think. I guess some of us would like to try their luck.

At least this should be quicker in an interview situation than the usual "normalize this database" or "given this situation, define a database schema".


I made the tables quick to go through them. It could do with more data really but its got a few rows.

https://gist.github.com/abkr/5662615


I had the same idea, but in an SQL Fiddle

http://sqlfiddle.com/#!1/778bb/5


Didn't even know that there was such a thing as SQL Fiddle. That's going to come in handy, beyond this quiz, thanks.


In the same vain here are some very quickly put together answers if anyone is interested. I haven't tested these so they may not work. If you see any errors please let me know :)

https://gist.github.com/dritterman/5662750


I like this set of questions and the simplicity of it. I think I would only add one or two simple questions about INSERT, UPDATE and DELETE statements.

Writing a bad SELECT doesn't tend to have the same ramifications as an incorrect DELETE or UPDATE statement.


Anyone else bothered when primary keys are not given the name 'ID'?


I'm bothered by primary keys full stop.


What's wrong with identity?


A concrete example is some folks like the abstraction of a row as your primary "thing" and some folks like the abstraction that the data defines a row and rows don't really exist just the data.

Consider the hated multiple primary key situation where you've got a autoincrementing prikey and a "real" key where you make an unique index off "full name" or something. So which is the real conceptual primary key? Shouldn't you use the full name as the "primary key"?

Problem: What if the business logic of what a distinct user is changes from unique "full name" to unique "full name" and "telephone number". Whoops now all your foreign keys need messing with, its just a bad scene. Ditto schema changes like you finally change from ascii to utf8 or something, now all your foreign keys need changing (well thats maybe a bad example unless your ascii datatype enforces 7 bits or you're running into byte length vs character length limits...) Or you change the length, which changes the truncation perhaps, which changes your foreign keys. Also you can't just use a rule like all foreign keys are BIGINT now some are CHAR(20) some are FLOAT who knows.

On the other hand lets say you implement just a prikey. Now you can have multiple rows with the same data, because you never set up a UNIQUE INDEX.

Generally speaking if you KNOW absolutely KNOW that your schema will never change, you should probably optimize it to not have multiple keys aka a primary key and unique indexes, or data definition will never change. Very few people can guarantee it so they're better off in the real world with imaginary prikeys.

You can read a lot more about this in "SQL antipatterns" I think chapter 4 or so, but always keep in mind that beyond noob level of being able to define the overall issue, short term snapshots will occasionally (but not always) conflict with longer term thinking.


> Consider the hated multiple primary key situation where you've got a autoincrementing prikey and a "real" key where you make an unique index off "full name" or something. So which is the real conceptual primary key?

If the full name is a real conceptual primary key, you shouldn't have introduced an autoincrement key. If the uniqueness of the fullname is a business rule but not a real conceptual restriction (a distinction which can be hard to make, to be sure), then it makes sense to create the autoincrement key -- and it is the only real primary key. (That is, the autoincrement key represents the concept of identity which isn't present in any of the other data.)

> On the other hand lets say you implement just a prikey. Now you can have multiple rows with the same data, because you never set up a UNIQUE INDEX.

No, you can't, because the "prikey", as you call it, is data, and has meaning -- specifically, it represents identity -- so rows which differ in it do not have "the same data".


The business concept of the "real conceptual primary key" can change, perhaps dramatically, over time, as the business model changes. A real prikey never changes.


> The business concept of the "real conceptual primary key" can change, perhaps dramatically, over time, as the business model changes. A real prikey never changes.

This confuses two different concepts:

If the conceptual model changes, then, yes, the candidate keys (including primary keys) of entities may change between the old model and the new model. This can be a pragmatic difficulty in migrating between different conceptual models, but that's a problem inherent in different conceptual models.

The value of a well-chosen primary key of an entity within any given model should not change, as the primary key should always be a value which identifies the entity such that a different primary key means a different identity.


ON UPDATE CASCADE is in the standard.


Designating a candidate key as primary is a nasty SQL implementation wart.


Liked them; not too hard but also not too easy. I'd have succeeded on the interview if I had been given the chance to test them (and if I wasn't too nervous about it I guess). Never had an interview with technical questions like this before; are you commonly given a chance to test them?

My database and answers dump (Warning: spoilers!) http://pastebin.com/HGBpemHn


The second question is actually trickier than one might think.

The obvious answer - something like

select Name, MAX(Salary) from Employees group by DepartmentId

is wrong.


Call me foolish, but what about the following makes it undesirable? The question didn't ask about a null case of a department with no employees.

-- List employees who have the biggest salary in their departments SELECT em.EmployeeID, em.departmentId, MAX(salary) as salary FROM employees em GROUP BY em.departmentId


In your example, for each row of the result set

* "em.departmentId" will contain one of the distinct values from the "departmentId" column

* "salary" will contain the maximum value of the "salary" column of the table rows whose "departmentId" equals "em.departmentId" of the given result set row.

* "em.EmployeeID" will contain the value of the "EmployeeID" column of one the table rows, whose "departmentID" equals "em.departmentId" of the given result set row, but it is UNDEFINED which one. It IS NOT quaranteed to be the one whose "salary" column equals "MAX(salary)".

See here for examples of how to achieve what is actually needed: http://dev.mysql.com/doc/refman/5.0/en/example-maximum-colum...

As I said, tricky, and, judging from the difficulty level of the other questions, I suspect that the authors of the article have fallen for it themselves.


Thanks for the clarification. 5/6 and dunce hat for me :)


In an interview, it would be wise to mention the special cases that might exist and how you would alter your answer if you had to taken them into account, rather than waiting to be told of the special cases.


I was wondering if the author realized this. Using Microsoft SQL Server, the answer would be to use a window function like row_number or dense_rank and then select where row = 1.


One initial step I use is to give a description of the problem, then have them actually design the table structure themselves. I then have a printed copy of the structure along with some sample data ready, so that I can refer to specific values (and also to help them visualize the contents).


I find it kind of interesting that the first comment on the article says they wouldn't be able to answer the questions by they use an ORM. I'm not really a fan of ORMs, personally, because I don't think it's useful to try and "map" a relational model onto an object-oriented one.


There are some really good SQL libraries out there that make it easy to compose SQL and some of them also include an ORM built on top of the basic abstraction.

SQLAlchemy is a great example of this, I always get precisely the SQL query I would've written myself, except it's syntactically correct, easy to compose and has a chance of being portable.


It's an interesting debate. While I also feel that developers should know the underlying SQL, however all that stuff like joins, indexes etc. are actually very hard to scale beyond one machine. MySQL cluster does attempt to do it automatically, but even it has limits, and places most stuff in memory. In short, if I was looking for developers to do sharding, I would actually prefer to AVOID queries with joins, non-pk lookups etc.

Having said that, I have discovered a heuristic over the years: that if you are using an ORM, you probably don't want a relational database. You should learn something like Riak and let it handle the distribution and provide all the partitioning and availability for you. The CAP theorem shows that you can't get it all, and most likely you want to use one of those data stores instead of a relational one.

For regular sites that won't have millions of users constantly using it, though, a relational db is fine.


“that if you are using an ORM, you probably don't want a relational database. You should learn something like Riak and let it handle the distribution and provide all the partitioning and availability for you”

These are not the same concept: a relational database makes sense when your application relies on relations between records. If you need to do lots of joins across many records, Riak is going to perform horribly because it's designed for a different problem.

CAP says nothing whatsoever about whether you want a relational or non-relational database, merely what tradeoffs you'll have to make to satisfy your business needs.

Using an ORM doesn't factor into this discussion at all other than for providing a convenient place to implement whatever system you devise to meet those needs.


As I said, it's a heuristic. If you find that you are telling your developers to use your ORM, then you probably should have gone with a NoSQL database like Riak. You can still do joins, etc. but it's in the context of things like map-reduce, and it makes sure that you can scale despite the joins.

MySQL way: SELECT * FROM a JOIN b ON x WHERE y

NoSQL way: 1) SELECT * FROM a WHERE x 2) Perform join in app layer or stored procedure.

Like it or not, when you scale you will lose one of the CAP, and NoSQL databases do the hard task of delivering an eventually consistent data store to you and letting you express yourself in the RIGHT context, which is not SQL.


You're still conflating different parts of the stack: an ORM has nothing to do with CAP.

> You can still do joins, etc. but it's in the context of things like map-reduce, and it makes sure that you can scale despite the joins.

Either of your examples are commonly implemented in SQL databases, too: this is a routine MySQL optimization to avoid subselects and, amusingly, one which an ORM makes significantly easier to implement:

SELECT * FROM a WHERE x; SELECT * FROM b WHERE pk IN (…list of IDs from first query…);

Again, the SQL vs. NoSQL question is about your data model and access patterns, not whether you use an ORM or magical thinking about CAP. The line between the two has become quite blurry since there are things like MySQL-backed key-value stores or Postgres extensions which allow it to handle document-store workloads without losing performance or giving up the ability to do flexible queries. This isn't a question of religion: it's just looking at your business, assessing how well you know the access patterns (SQL systems are generally more flexible) and performance requirements and picking the best solution. Anyone claiming to have a right answer for everyone is wrong.

> Like it or not, when you scale you will lose one of the CAP, and NoSQL databases do the hard task of delivering an eventually consistent data store to you and letting you express yourself in the RIGHT context, which is not SQL.

You've now gone from wrong to very dangerously wrong: there is no scale which is immune to CAP and NoSQL has no magic for avoiding this. Eventual consistency is only appropriate for some problems and, as above, can be implemented on either system. No matter what storage system you choose you're still going to have to make careful decisions about business priorities and test carefully.


Much better than this question i got asked once:

"Which is faster: select from a table or from a view?"


I don't see why that is a bad question. A reasonable answer is selecting from a table. Of course it depends on many factors. I often ask questions like this just to get the candidate to tell me why there is not an absolute answer.


Yeah that's the thing. I was talking about computed columns in tables, materialized (or indexed) views, about measuring stuff by experiment and about measuring stuff that actually matters but the interviewer seemed like he wanted to hear a simple answer


Unless, of course, the view is a materialized view.


If a candidate can talk intelligently about materialized views, I think we're past the "explain HAVING" stage of the interview.


This is the perfect interview test. It's easy enough that a candidate can roll through it relatively quickly, but deep enough to prove they have the experience they claim to have. Kudos on having a smart interview test!


It would be really interesting to see solutions for MongoDB. These questions are designed to be easily solvable with SQL, but it would be interesting to see how this can be solved with a totally different technology.


They seem to be quite simple. For what profiles are you asking these questions?. I have used very similar questions for dev ops role.


They're good questions because they go past the two simplest query types:

    select from ... where ...;
and

    select from ... join ... where;
Those are the types 99% of programmers who use SQL for simple CRUD apps know. But they come up short for asking more useful business questions.

Eyeballing the list, it tests subqueries, GROUP BY, HAVING, OUTER JOIN, IN/NOT IN and SUM. Fairly useful primitives for general query writing.

I'd try to add a question that relies on UNION, INTERSECT or EXCEPT.


For interview question, the keyword I like best would be EXISTS. Sadly, many developers write SQL for years without knowing its existence. If I am not mistaken, the propel ORM of symfony doesn't even have native support for it.

As shown by codegeek, the 6 questions here can be answered without needing sub-query. Maybe we can add something like "List employees who are not working alone in their department"


You'd be surprised how many people claim knowledge of relational databases but who could not answer those questions.


Yeah they seem quite obvious really. Serious question: is this really the types of questions asked for a dev job interview? (was still interesting to see.)

>List all departments along with the number of people there (tricky - people often do an "inner join" leaving out empty departments)

inner join seems the non obvious way to do it really IMO.

    select 
    departments.name as "department name",  
    (select sum(salary) from employees where employees.departmentid = departments.departmentid) as "department total salary" 
    from departments


The empty department remark is the clue. They want all departments; if it's an empty department they still want to see that it's an empty department.

An inner join will hide that row because there's no equality between a set (departments) and an empty set (employees in that dept, of which there are none). A correctly structured outer join will.


They're looking for an outer join, I think.


Coincidentally, this is also the list of questions that I need to ask anyone who is recommending a particular NoSQL solution.


Here's a SQLFiddle to try the questions out :)

http://sqlfiddle.com/#!2/9a84e


That's perfect!


Plural table names? <irritated hipster sigh>


wait, they have somebody doing only SQL ?


There is someone that works indirectly for me and that is all they do. A lot of data warehouse and DTL stuff, so SQL is their life.


<Blunt> I'm not a SQL Developer, but anytime I get questions for which Google has answers to be found in 5 minutes or less, I'm quite hesitant to work there.

I respect interviews that go along the lines of: "What would you do if..." after giving a detailed description of their environment. But then again I'm a tools/OSes admin, so maybe it makes more sense for my job description. But anytime guys are too focused on third option you can give to the 'ls' command and don't ask real world questions related to potential or existing issues they have/had, I'm not really interested. I can use google, you know, if that answers your question.

I like questions where the interviewer can see years of experience not the amount of detail memorized from a manual.

I would imagine questions that are good to start with: how you as a developer usually start a design of a database? How do you plan it? I got this question once, it's really good. Can't answer that after reading sql book two days earlier. In contrast to his question. </Blunt>


These aren't "details memorised from a manual". These are fundamental primitives of SQL queries. You simply cannot express most serious questions without them.

If someone came to me and I asked them to write fizzbuzz, an answer of the form "I would google if-thens and modulo and print statements" would be a pretty obvious no-hire.


I would imagine questions that are good to start with: how you as a developer usually start a design of a database? How do you plan it? I got this question once, I thought it was really good. On the spot I could tell the engineer who asked me that is experienced. Can't answer that after reading sql book two days earlier. In contrast to the questions from the post.


In a business setting, the conceptual/relational, logical and physical design of a database happens far less often than querying such databases.

Indeed, one of the reasons normalisation is such a Big Deal to relational bigots like me is that it makes SQL's querying tools much more useful and versatile.


True story, but such questions are just senseless.

If someone knows about set-theory and the ideas behind query-languages, knowing SQL isn't a deal breaker, it's just a nice to have.


Folk generally don't study relational algebra, on its own, for fun.


SQL is one of those technologies that caps how much enjoyment you can derive from understanding the related theory. Scheme, for instance, never stops giving, but you can hit a point with SQL where you start to wonder: where's my Tutorial D?


And where are my proper temporal primitives?

It sucks, but it's what we have for now.


They really should.


If someone knows set theory, they're likely not interviewing for a LOB CRUD job banging out Rails models.


What if you're not looking for an experienced data modeller, but someone who can write SQL?

Also: If you can read the SQL book two days earlier and answer these questions in a reasonable amount of time - sounds like an insta-hire to me. There's certainly nothing so hard about SQL that you couldn't do that - but some people who spend a lot of time working with databases seem to find the climb insurmountable.


When I learnt SQL, the part that took me the longest to square away was the distinction between WHERE and HAVING. It was thoroughly confusing at the time.

Once you grasp that conceptually, SQL generally executes left-to-right, it's easier.


I avoid `having` wherever possible these days. There are data analysts in my team that I know would have to think twice about it if they encountered it in my code, and to be honest, it's a bit of an wart in the SQL grammar anyway.

In (almost) all cases, a subquery with an extra `where` clause will have equal performance to a `having` clause in the main query.

See the final example in: http://en.wikipedia.org/wiki/Having_(SQL)


If my datastore is a SQL database, I don't want you modeling anything if you can't answer these questions.


I think you'd be surprised the percentage of people who would fail to answer these (remembering how many can't answer fizzbuzz). I wouldn't read the answers to them as any measure of real expertise, just you're talking to someone who knows their way round a database and can apply the basics - you'd be shocked how many people who "know" SQL can't use GROUP BY (even when prompted). For a lot of people SELECT, FROM, WHERE is SQL.

In terms of "what would you do if" type questions - they're often no less prone to book answers. What you really want is "give me a specific example of when this happened to you and how did you behave" as they're more likely to test real experience. Sure people can lie but when you start digging into it few people are good enough to continually make up exact details on the spot without getting suspiciously vague (or having behaved in a suspiciously perfect manner).


3rd paragraph of article:

"Some people might say it is too basic, but that's not the point. The test's job is not to tell genius and rockstars from "normal" devs. The purpose is to save you time and quickly filter out DB-experienced guys from the ones that just claim to be."


Rockstar questions would start with window functions.

(Not it!)


You raise a good point about interviewing via trivia questions but I think you'd have to agree that for someone to know a subject there is an inherent amount of information that a person needs to know.

What that line is can be fuzzy.

Consider C, what is the difference between the stack and the heap? yes you can look this up but if a candidate doesn't know this cold then they just can't write C code properly.

Simiarly for sql if a candidate can't write a simple join then it's pretty obvious that they haven't really used sql before.

The sql questions asked here are somewhere around the fizzbuzz level of skill. Any sql user should be able to answer them given a properly setup dev environment.


An interview is a conversation, which is a very high bandwidth medium of information exchange. But in order to get to the level of conversation, both parties need to be speaking the same language. What this test establishes is the baseline level of common language that makes it even possible to have a conversation about SQL databases. Someone who can't handle these sorts of questions is not someone I can usefully have a conversation about SQL with. I can teach them, and sometimes it's useful to have a pedagogical sort of interview; but for a standard, "can you do this job" interview, no, I need to be able to establish a baseline level of common understanding.


Keep in mind that most of the time there are different solutions for a problem. They all finish the job but some are better than others.

Knowing why and which solution is best will get you the job.


I don't run a school. Anytime a web-developer does not know basic sql or other basic knowledge, I refuse to pay salary, and even deduct the 1$ per minute. That five minutes just cost you $10


Not a big fan of on-the-job training, then?

Instead of paying a nominal price for people relearning some primitives for a specialized area of IT/software dev, nowadays most companies want the new kids to go to $80,000 worth of college for the same. That way, the employer gets to skip training, and the student is less likely to leave because he or she has too much damned debt to be mobile.

Maybe I'm projecting.


Actually I am a big fan of it.

But I'm not a fan of people stating that see these kinds of tests as an attack, or see it as something which is unnecessary. There are a lot of people out there are unable to answer these questions, even with "google". And most of them are either unwilling or unable to improve themselves.

I'd rather have a very eager "junior" who loves what he does, and works hard to understand and develop him/herself than someone "senior" who knows a couple of tricks and is too arrogant to do these kinds of tests.

You keep people if: * they can improve / if they learn * if the colleagues are nice * and the product is interesting, aka they have meaning * management is done reasonably * there's a future for them

this is why startups are popular (learn lots of things, meaningful, good upside) as well as corporates (carreer path is flexible, good management, etc).


Refuse to pay salary? Good way to end up with a terrible reputation and/or a broken nose.


We are going to have to deduct a few dollars for your lack of basic multiplication.


Ignoring the fact that 1$/minute * 5minutes = 5$ -- I wonder which _knowledgeable_ person is willing to work with someone who thinks like you do...


I said I wouldn't be paying the salary + even deducting $1 which is $2 per minute.

This person did not qualify to be _knowledgeable_, as he refused to do this test ;)


I'd guess that's a labor law violation in most developed nations.


Bet you have one happy bunch of motivated developers working for you... that's an 'interesting' management style.


Most likely no one.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: