Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

NULL in SQL is often interpreted in many different ways. The most helpful I’ve found is to think of it as unknown. Postgres has the IS DISTINCT FROM operator to capture what you’ve intended above:

    ... WHERE my_column IS DISTINCT FROM 5


I wasn't aware, thanks for the tip. Is there any equivalent for sets of values? For example:

  SELECT * FROM my_table WHERE my_column NOT IN (5, 6)


I think what you are looking for is a compound condition:

     SELECT * FROM my_table WHERE my_column != 5 OR my_column IS NULL;
This is because what you are selecting for is two conditions: when the value is != 5 and when the value is NULL so the result of != 5 is unknown.

FWIW I agree with you that NULL's are counter-intuitive. While I am more or less aware of all the various ways to account for them, I still gravitate towards SQL schemas without NULL's since I prefer the intuitiveness of 2VL when writing or reading SQL.


Something along the lines of

  COALESCE(my_column, -1)
might work for you in this case. See [1] for documentation on it in Postgres (it is in ANSI SQL though).

[1]: https://www.postgresql.org/docs/current/functions-conditiona...


...and is not null


That will have no effect on the query.


fine, here is the full query since extrapolating from incomplete data is hard for you:

SELECT * FROM my_table WHERE my_column NOT IN (5, 6) AND my_column IS NOT NULL

happy now?


These two queries will always return the same results:

  SELECT * FROM my_table WHERE my_column NOT IN (5, 6)

  SELECT * FROM my_table WHERE my_column NOT IN (5, 6) AND my_column IS NOT NULL
Because "my_column NOT IN (5, 6)" will exclude NULL values.


Right, my bad. Upper parent was about using IN and having return everything "but" 5 and 6. So here the select that will do that:

select mumu from kaka where mumu not in (5, 6) or mumu is null.

That will return everything, null included, except for 5 and 6.

But wait, there is more. If, for example, performance is the main issue here above query is quite slow. Even on an indexed table on column mumu, it will still do a full scan of the table before returning. How to improve performance in this case? Well, you use LEFT join on itself. Implementation is left as exercise for reader :D.


does postgres really not index nulls in a useful way? mysql does, though it may only work efficiently on a single val-or-null comparison at a time.


nobody does. MySQL, Oracle, MSSQL, you name it. All sux. That's why I prefer to always declare NOT NULL and have a DEFAULT value when I create tables. Treat the default value as NULL and you'll increase performance a lot.


PostgreSQL will happily use an index when looking up nulls. Given an index over an integer column, a query for a constant non-null value appears like:

                               QUERY PLAN                              
  ---------------------------------------------------------------------
   Index Scan using foo_b_idx on foo  (cost=0.29..8.30 rows=1 width=8)
     Index Cond: (b = 333)
The exact same query plan is generated for a query looking for nulls:

                               QUERY PLAN                              
  ---------------------------------------------------------------------
   Index Scan using foo_b_idx on foo  (cost=0.29..8.30 rows=1 width=8)
     Index Cond: (b IS NULL)
(That is, the condition on the scan is the only thing that differs.)

(I would heavily suspect that both MSSQL and MySQL have similar behavior here; this is an easy optimization for a query planner.)


MySQL does too, yes, the comment is clearly incorrect: https://dev.mysql.com/doc/refman/8.0/en/is-null-optimization...

And it has done so for many versions, 8 is just the default redirect I got. The very first sentence on that page is:

>MySQL can perform the same optimization on col_name IS NULL that it can use for col_name = constant_value.

It works effectively everywhere, with the restriction that it can only do one null-optimized comparison per index (e.g. `(a == 1 or a is null) AND (b == 1 or b is null)` will only optimize one piece). Which is a potentially-significant restriction that makes me curious about the internals... but it does exist, and typical for MySQL is clearly documented.


So I guess ,,OR column IS NULL'' would have been the correct answer?


Pretty sure most DBs do not treat that any differently than

`select * from table where column not in (5,6)`

Can you name one that does?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: