Show HN: Liftbridge – Lightweight, fault-tolerant message streams

berkayozturk · on April 28, 2020

For those who are wondering the difference between Kafka/Pulsar and Liftbridge (from the docs):

"Liftbridge was designed to bridge the gap between sophisticated but complex log-based messaging systems like Apache Kafka and Apache Pulsar and simpler, cloud-native solutions. There is no ZooKeeper or other unwieldy dependencies, no JVM, no complicated API or configuration, and client libraries are just gRPC. More importantly, Liftbridge aims to extend NATS with a durable, at-least-once delivery mechanism that upholds the NATS tenets of simplicity, performance, and scalability. Unlike NATS Streaming, it uses the core NATS protocol with optional extensions. This means it can be added to an existing NATS deployment to provide message durability with no code changes. The ultimate goal of Liftbridge is to provide a message-streaming solution with a focus on simplicity and usability."

ignoramous · on April 29, 2020

NATS' author, u/tylertreat, wrote a series of articles on how to build a distributed log that are pretty insightful:

Part 1, Storage Mechanics: https://news.ycombinator.com/item?id=15983185

Part 2, Data Replication: https://news.ycombinator.com/item?id=16021876

Part 3, Message Delivery: https://news.ycombinator.com/item?id=16101223

Part 4, Trade-offs and Lessons: https://news.ycombinator.com/item?id=16181963

Part 5, Sketching a New System: https://news.ycombinator.com/item?id=16215788

atombender · on April 29, 2020

I think you mean Liftbridge's author. NATS was created by Derek Collison [1], co-founder of CloudFoundry.

[1] https://news.ycombinator.com/user?id=derekcollison

kiwicopple · on April 28, 2020

This is very cool OP. Amazing work.

For anyone looking for something similar, we are adding streams to Postgres - https://github.com/supabase/realtime

It isn’t an exact comparison but it has some of the features you see here too - you can use wildcards listen to: all changes in your database, schema-level, table-level changes, or row-level. The server is built with Elixir, so you can have thousands of listeners connected via web sockets.

It’s not as lightweight as OP’s solution but PG is very familiar.

davedx · on April 29, 2020

This is really exciting, nice! We've just started using postgres for our new services and the other day I was just thinking to myself "wouldn't it be great if I could listen to certain database tables because I can then build a real-time data warehouse really easily". How far along is the implementation?

kiwicopple · on April 29, 2020

Thanks for the feedback @davedx. Actually it's being used in production by a few companies already (not small companies either). We have a few things we want to do before it's "enterprise ready":

    - set up elixir clustering (i.e. autoscaling)
    - run benchmarks
    - fine-grained client auth
    - ability to listen to multiple databases

> then build a real-time data warehouse really easily

This is the nice thing about PG. It's an operational database that can handle the load of an analytics database. Easy choice! If you need a hand trying this out, message me: copple [at] supabase.io

alexnewman · on April 29, 2020

We made https://github.com/instructure/jsoncdc in rust which can do much of the same

tylertreat · on April 28, 2020

Author here. I started Liftbridge after working as a core committer on NATS and NATS Streaming (as well as my experience using Kafka in the past). In particular, implementing replication in NATS Streaming and drawing from that experience/lessons learned. Happy to answer any questions!

tutfbhuf · on April 28, 2020

Could you explain the differences between Liftbridge and Kafka?

tylertreat · on April 28, 2020

Some of the key differences:

- Written in Go rather than Java/Scala (big benefit here IMO is getting a small static binary rather than having to run a JVM)

- Doesn't rely on Zookeeper

- Is integrated with NATS (can extend NATS with Kafka-like semantics, but in the future will allow for abstracting NATS away entirely)

- Uses gRPC for its API

- Supports "wildcard topics" - e.g. a stream can listen to the topic "foo.*" and will receive messages published on "foo.bar", "foo.baz", "foo.qux", etc.

- Allows for streams to be paused and subsequently resumed automatically when published to - on the roadmap is "auto-pausing" of sparsely used streams

- Exposes an activity stream that allows you to respond to events such as streams being created, deleted, paused, etc.

One advantage Kafka currently has is its consumer groups, which are on the roadmap for Liftbridge but not implemented yet.

There's a more complete comparison available here: https://liftbridge.io/docs/feature-comparison.html

DLA · on April 29, 2020

And for exactly these reasons I absolutely LOVE this project. Big Go user. Finding this today was a pure gift. Thank you very much for making these excellent design choices and for building this! I hope to contribute in the future.

jaytaylor · on April 29, 2020

For those not already familiar with NATS:

https://docs.nats.io/nats-streaming-concepts/intro

polskibus · on April 28, 2020

Could you compare Liftbridge to Akka with persistence module and or streams? I'm looking for a lightweight, embeddable Kafkasque thing (esp interested in exactly once delivery) that I can use to glue various services together.

tylertreat · on April 28, 2020

I'm not super familiar with Akka beyond high-level/actor model so I can't competently compare the two. NATS can be used to implement an actor model and has a similar philosophy to Akka core in terms of delivery semantics (i.e. no real guarantees - https://doc.akka.io/docs/akka/2.1/general/message-delivery-g...).

Looking at Akka Streams, it appears to be more in line with Liftbridge, though I'm not totally sure. Liftbridge (like Kafka), is a message log, so messages do not get acked or removed from a stream (unless removed due to retention limits or compaction). I'm not sure if that's the case with Akka Streams.

I'm curious what your needs are around exactly-once delivery. From what I can tell, Akka only supports at-least-once semantics and is of the opinion that exactly-once isn't really possible as such (which is an opinion I agree with, but that's a more nuanced discussion). https://www.lightbend.com/blog/how-akka-works-exactly-once-m...

kitd · on April 28, 2020

Not OP but you may find MQTT useful.

http://mqtt.org/

polskibus · on April 29, 2020

Thanks, is there an option to make guaranteed delivery with MQTT?

kitd · on April 29, 2020

What you want is the quality of service "QoS" flag set at connection time (edit: sorry, correction, publish time):

https://docs.oasis-open.org/mqtt/mqtt/v5.0/os/mqtt-v5.0-os.h...

Bear in mind that there are always limits in how guaranteed a "guaranteed delivery" actually is. "Exactly once" means that the broker has received it and the client has acknowledged that. What happens beyond the broker is outside the scope of the protocol and would have to be handled at the application level, eg using transaction ids, etc.

olau · on April 29, 2020

Not a question, but just a quick comment: Your overview page assumes familiarity with a bunch of concepts. If you could find the time, perhaps it would be helpful writing up a high-level explanation that makes none of those assumptions.

For instance, it sounds like it's a cluster-based distributed system for some clients to post messages and other clients to get notified about them through pattern matching on a standardized protocol, without any messages getting lost.

tylertreat · on April 29, 2020

Thanks, this is a good suggestion. The docs definitely need a lot of work still!

kodablah · on April 28, 2020

I've been watching this with some interest as the successor of NATS streaming. One question I have is how does it fair in potential-high-latency, geographically distributed situations? E.g. what if I had dozens of servers at the edge worldwide wanting to receive events from one another?

paddybyers · on April 28, 2020

We've been experimenting with Liftbridge at a range of scales and workloads, and clusters that are distributed across a single region perform very well. However, it's not really suitable for operating a cluster distributed across multiple regions because the latency adversely impacts the bandwidth of the Raft protocol that's used for cluster-wide coordination; this impacts the rate at which streams can be created, and the rate at which changes to In-Sync Replica sets (ISRs) can be propagated. As suggested, a federation of clusters, each covering a single region, would be required.

tylertreat · on April 28, 2020

Good question. It's early days, so there's still a lot of work to be done around geo-replication/geo-aware. It depends on the specific use case, but there are a few approaches you could take. You could run a Liftbridge and NATS cluster in each DC with the NATS cluster connected to a global NATS supercluster. Alternatively, you could just run separate Liftbridge clusters and do replication between them (sort of like the MirrorMaker approach in Kafka, though the tooling here is still lacking for Liftbridge).

Longer term, I'd like to take advantage of the recent work in NATS around superclusters and geo-aware subscribers to provide better geo-replication primitives directly inside Liftbridge (https://www.slideshare.net/nats_io/deploy-secure-and-scalabl...).

bamazizi · on April 29, 2020

The project has basically jumped from v0.0.1 to v1.0.0 within 4-5 months ... but how does it fare with reliability, throughput, ordering and deliveries?

Is it PROD ready? are there any use-cases or project sizes this utility is NOT recommended for? i.e. sensitive events such as payment or ecomm related vs. web traffic noise logs

fcanesin · on April 29, 2020

How does it compare in performance?