Range queries with Badger - Garren's public notes

[Badger](https://dgraph.io/docs/badger/) is a pure-go implementation of a [[Databases/Log Structured Merge Tree|Log Structured Merge Tree]]. You typically use it as a Key/Value store with transactional support. I've been trying out a few ideas with it and found it fun to work out. I did find the documentation to be a little sparse and it wasn't entirely obvious how to do different types of range queries. So here are my notes on range queries. ### The setup We need some data to query. We going to create two `subspaces` so that we can confirm that our range queries only read from one `subspace` at a time. A `subspace` is the same as a `table` in a relational database. When we scan badger we want to read from one `subspace` at a time so that we don't mix up data. ```go func Insert(db *badger.DB, subspace []byte) error { tx := db.NewTransaction(true) defer tx.Discard() for i := 0; i < 10; i++ { key := Pack(subspace, []byte(fmt.Sprintf("row-%d", i))) if err := tx.Set(key, []byte(fmt.Sprintf("value-%d", i))); err != nil { return err } } return tx.Commit() } // Pack is a simple function to join a key to a subspace func Pack(subspace []byte, key []byte) []byte { keyBits := [][]byte{subspace, key} return bytes.Join(keyBits, []byte("-")) } ``` The insert generates 10 `{subspace}-row-{i}:value-{i}` rows for a given subspace. ### Full Scan The first range query is a full `subspace` scan. Importantly, we want to read forwards and backwards, so there is extra logic required to make sure we start and end our full-range scans at the correct places. For a forward scan, we `seek` or set the iterator pointer to the start of the `subspace` and keep reading until the returned key is not part of the `subspace`. For a reverse scan, we need to `seek` past the subspace and then iterate backwards over the subspace, if we don't do that Badger doesn't return anything. So we have a small `End` function that creates a start key that is 1 bigger than the subspace. ```go // Reads all the keys in a subspace // For `reverse=true` we need to seek to find a key that is one larger // than the subspace range and then scan backwards from there func ReadFull(db *badger.DB, subspace []byte, reverse bool) error { return db.View(func(tx *badger.Txn) error { iter := tx.NewIterator(badger.IteratorOptions{ Reverse: reverse, Prefix: subspace, }) startK := subspace if reverse { startK = End(subspace) } defer iter.Close() for iter.Seek(startK); iter.ValidForPrefix(subspace); iter.Next() { val, err := iter.Item().ValueCopy(nil) if err != nil { return err } fmt.Printf("%s : %s \n", string(iter.Item().Key()), val) } return nil }) } // Copy the prefix into a new slice that is one larger than // the prefix and add an `0xFF` byte to it so func End(prefix []byte) []byte { end := make([]byte, len(prefix)+1) copy(end, prefix) end[len(end)-1] = 0xFF return end } ``` ### Range Reads A range read will read the keys in the `subspace` that are between the start and end key. I've made it so that the range read is inclusive of the start and end key. We use the `Pack` function to combine the key and the subspace so that the iterator only reads from the single subspace. When reading forward, the iterator is set to the start key and then reads until it either reads to the end of the subspace or reads a key that is larger than the end key. For reading in reverse, the iterator will seek to the end key and read backwards until it reads a key that is less than the start key or it reaches the end of the subspace. ```go // Read between two keys, when going forward if the key is greater than the end key then stop // and for backwards check if the key is less than the start key and then stop func ReadRange(db *badger.DB, subspace []byte, start []byte, end []byte, reverse bool) error { return db.View(func(tx *badger.Txn) error { iter := tx.NewIterator(badger.IteratorOptions{ Reverse: reverse, Prefix: subspace, }) endK := Pack(subspace, end) startK := Pack(subspace, start) if reverse { startK = Pack(subspace, end) endK = Pack(subspace, start) } defer iter.Close() for iter.Seek(startK); iter.ValidForPrefix(subspace); iter.Next() { key := iter.Item().Key() if reverse { if bytes.Compare(key, endK) < 0 { break } } else { if bytes.Compare(key, endK) > 0 { break } } val, err := iter.Item().ValueCopy(nil) if err != nil { return err } fmt.Printf("%s : %s \n", string(iter.Item().Key()), val) } return nil }) } ``` And that is how to range read from Badger. If you are looking for an challenge to extend this, take the current code and extend it so that it returns an iterator so that it could be used like this: ```go func IteratorUsage() { iter := ReadRange(db, "subspace", "startkey", "endkey") var kv KeyValue for iter.Next(&kv) { fmt.Println(kv.Key, kv.Value) } } ``` The full code can be found at this [gist](https://gist.github.com/garrensmith/243035b430c43a32a6f8240c9f875c57).