betty.land

BeTTY

BeTTY (or simply "betty") is a topic-based local-first knowledge graph protocol that combines the most useful aspects of Project Xanadu, Plan 9 From Bell Labs, Gopher, IPFS, Wikis, and NoSQL into a human- and machine-readable alternative to the World Wide Web.

BeTTY aims to avoid a lot of the cruft accumulated during the past 30 years of experimentation with the World Wide Web, a useful but ultimately troubled protocol. In bettyland, we ask: what if we took a different path? What if we could retake a crucial fork in the road to what we used to call the Information SuperHighway?

For BeTTY, that moment was represented by Gopher, a text-based competitor to the WWW. Gopherspace's "index server" search engines (WAIS, Veronica and Jughead) were based on the first Internet indexing search engine, Archie (which crawled FTP servers).

Gopher was built at the University of Minnesota to resemble a worldwide, interconnected hierarchical filesystem; everything served was either a resource or a menu of resources and/or inline text. Betty, on the other hand, is built to resemble gopher.

Welcome to Bettyland

A public domain image of Betty Cooper from Pep Comics What if, after all these Archie Comics references, they finally got around to naming something after the only intelligent Riverdale character, Betty Cooper? The rest of them are all chuckleheads anyway.

BeTTY began as a simple proposition: what if we implemented gopher, but using flexible, machine- and human-readable JSON instead of the clunky old spreadsheet-style gophermap? What if instead of a "semantic web" we built a semantic gopher?

The JavaScript Object Notation standard consists of a list of items which are either objects (e.g., a number, a bit of text, a list of objects, or even a whole nested document) or key-value pairs. This is a valid JSON file:

  {
    "Key1": "Some text",
    "Key2": ["A","list","of","things","including","numbers",0,1,2],
    "Key3": {
      "This":"could be considered a whole new document if detached."
    },
    "BooleanValue": true,
    "Anything": 1.23456
  }

This maps well onto the gopher paradigm of resources and lists of resources, but it opens up a lot of other possibilities for data use. Every BettyDoc is either a single topic (a discrete piece of information) or a list of topics and/or inline text (we call this text a message). But bettyland isn't hierarchical, it's federated, which is also how gopher actually worked.

Forget "CRUD," meet LUMA

The current convention for Web-based APIs is based on REST and CRUD. To quote IBM:

REST APIs communicate via HTTP requests to perform standard database functions like creating, reading, updating, and deleting records (also known as CRUD) within a resource. For example, a REST API would use a GET request to retrieve a record, a POST request to create one, a PUT request to update a record, and a DELETE request to delete one. All HTTP methods can be used in API calls. A well-designed REST API is similar to a website running in a web browser with built-in HTTP functionality.

Since BeTTY is a federated topic-based graph rather than a database, its understanding of data is a bit more nuanced than the limits provided by the CRUD paradigm. For example: how do we guarantee version control and permissioned writes of data to systems outside of the original scope of a document, i.e., when topics are shared between nodes? How do we distinguish between updating some internal part of a document and appending new information to an existing queue without changing the previous contents, i.e., writing vs appending a file? And how does a standard for search fit in?

BeTTY stays RESTful, but wipes away the CRUD by implementing a new paradigm we call LUMA. (We could have refactored it to LUA, but there's already a great open source language by that name.) This is an acronym for betty's own set of basic REST API verbs:

List, which is analogous to Read as well as search, with request submitted via GET and POST methods. If your request matches multiple documents, you'll get a list of links; if there's only one document returned, you'll get the entire file.
Upsert, which handles Create, Update and Delete functions via POST.
Message, which allows Mastodon-compatible error, resource request, logging and other communication between nodes whether on- or offline.
Append, which handles append-style Create and Update writes to files, even if those files are unreadable by the writer (aka blind-append). The "Message" function is simply a blind-append applied to an inbox or other message queue.

Everything else in bettyland, as they say, is commentary: you can link inputs and outputs of various LUMA queries to your own functions to create new verbs and extend BeTTY's capabilities.

Forget "Databases," meet Tags

Where other information stores use hierarchical concepts like 'tables' and 'collections' and so forth, BeTTY prefers not to silo things that way because we're interested in finding the connections between pieces of disparate data we can access within bettyland.

Instead of the old 'folder' or 'namespace' concepts in file systems or databases, BeTTY stores everything in one big pool. We invert the 'collection' concept by allowing data to exist in multiple sets at once, with the tag system.

Tags are associated with topics in a many-to-many fashion, and tags can be added to documents at any time. However, a tag will impose indexing requirements for all tagged documents upon write or update. This is BeTTY's data conformance method, which will report to the system the existence (and if so, the value of) any indexed fields within the document, provided the node has access to the private key for this file.

In this way, you can impose standards on documents, even rewriting local copies of your documents to include or derive indexed fields based on the requirements of a new tag. Documents can conform to multiple indexing standards at once, which is the equivalent of linking the same record in multiple databases and tables.

Forget "Passwords," meet Encryption-Based Permissions

Oh, that's right, did we forget to mention everything is encrypted at rest and public/private keys are used instead of passwords? Instead of worrying about federating authorization and authentication across a vulnerable network, BeTTY jettisons all of that insecure nonsense and splits up "auth" very simply. A BeTTY node may authorize a request, but the client performs the authentication themselves: if you don't have the private key for the public key you supplied the server when you requested the information, you can't access the file.

This allows BeTTY to send everything the way Gopher intended: over port 70 and without bothering with all that TLS/SSL or LDAP infrastructure. We just have keyserver integration for PGP public keys; you can use any email address as your unique bettyland identifier, and update your keys whenever you need.

BeTTY's access-control mechanism updates the traditional UNIX model with an understanding of how information flows on systems like the Internet, and how that differs from the traditional concepts which defined computer data access for the past 50 years.

UNIX uses octal numbers to denote three types of access for a user or group: read, write and execute.

BeTTY uses octals to build access-control lists for public keys (which are unique by email address and fingerprint). Permissions for bettyDocs are a little different: copy, update, and index.

BeTTY understands that once you can read something, you can copy it. However, each node has its own version control history, and the right to update a file on the origin server is granted at the discretion of the owner of that file. Similarly, the visibility of a file in indices and the searches which use them should also be controlled at the discretion of the owner.

Forget "filenames," meet Content-based Addressing and Encrypted Content Indexing

Since a file may be encrypted at rest to a public key for which you don't yet have a private key, we need a way to make the data discoverable if not entirely readable. BeTTy uses a content-based addressing scheme which creates a unique identifier based on the content of a topic, then lets you add any metadata, headlines, summaries, tags, and so forth about that data separately to the meta key of your bettyDoc.

The big drawback of content-based addressing is that it makes each address immutable, meaning that the information stored at that address can never be updated. BeTTY feels that this is no fun, and would rather just implement an easy distributed version control and local backup archive along with UNIX-style symbolic links between topics do a better job of keeping data acessible.

Every valid bettyDoc has a hashed value unique identifier called the revisionID. If no such revisionID is found when writing to the local backend, this string of base64 characters is also used as the topicID. Future updates to a topic change the revisionID, not the topicID; you can even add a salt to the content hash to ensure your update is noted.

Meet the BettyDoc

Every document in bettyland must conform to a simple JSON standard, very much like XHTML files must include <html>, <head>, and <body> elements.

BeTTY's three mandatory keys are

betty, which stores information which is only relevant to the local node;
meta, which stores all information about the contents of this document, including all indexed fields, authorship, headlines and descriptions and so forth;
The fileType key, which tells betty what kind of document this is. There are four fileTypes (so far), and the last key must be exactly one of them:
- content, for a single piece of information;
- index, for an indexed array of links to content and other information, each section with its own unique field names;
- directory, which is an endpoint where search results are collected in an unordered, not-guaranteed-to-be-unique list; or
- queue, which is a conversation between two or more participants where every message is uniquely keyed by content. (A queue is a special case of directory.)

Managing and merging documents

Suggesting that there be a uniform search/upsert/append mechanism for every available information resource on a network is all well and good, but what about when there are conflicting edits?

In more technical terms, you might wonder: which concurrent version control method is betty proposing? What kind of CRDT (conflict-free replicated datatype) are you thinking about using for this?

Betty is very much into letting you figure all that out for yourself; there are plenty of great automated engines and good old manual review technologies available for this kind of thing. Resolving your personal or organizational data conflicts is something betty supports, but doesn't want to get in the middle(ware) of, because our goal here is to slide as far down the application stack as we can, and let you build things on top of BeTTY you could never do with the World Wide Web.

We want you to plug your CRDT solution into BeTTY either by using one of the three pieces of software described below; building a plugin for (at least) one of them; or, and we mean this very sincerely, doing a better job of building a betty-compatible replacement for the functionality we describe in the API than we can in the language of your choice.

And while you're at it, why not define separate merging methodologies by tag so you don't have to stick to just one?

Forget "Model-View-Controller," meet bettyd, btty, and httbd

Web frameworks often work with the "MVC" concept: there's a database, then something that controls access to that database, and then something that presents this data to the user.

BeTTY has a similar separation of functions in the three main packages: bettyd is a node server which reads and writes to data backends; btty is a command-line interface to bettyd via the BeTTY API (and can even instantiate/scale bettyd servers). Httbd is a betty/XHTML gateway which allows you to construct custom dashboards, apps, interfaces, etc with CSS and JavaScript.

Bettyd, the betty node server

Bettyd is tasked with passing all of the instructions you can possibly jam into the LUMA API to a data storage/processing plugin, otherwise knows as a backend. The node then replies with the appropriate messages, documents, indexes and/or directory endpoints.

Since we're building a semantic gopher to correct everything that went wrong with the web, bettyd uses old standards, but better. So bettyd serves encrypted and/or compressed JSON "betty://" URLs via HTTP/1 on port 70. No awkward TLS/SSL handshakes here.

The servers (betty prefers to call them nodes) in bettyland are not meant to be always-on (i.e., "available" in the CAP model). Because bettyland is local-first, nodes running bettyd focus on allowing access to the locally defined data storage backends and plugins, which store copies of resources pointed to by searches and transclusions.

Bettyd server processes scale with active job queues; while awaiting federated messages from across bettyland, your servers need not be active (all the time) if they're periodically polling trusted servers to relay requests and notifications.

The Default Flat File Backend

The default backend for bettyd is the flatFile plugin, which uses JSON files compressed and encrypted at rest to the server's public key, which should be the same as your public key. Unlike in Project Xanadu, not all servers are unique (because that's not how scalability works today), but they are identified by the public key of the user or group/alias who starts up that node by entering their private key passphrase.

Btty, the command-line Betty client

The btty command seamlessly handles encryption, transclusion, pagination, searches, messaging, and anything else you can do in bettyland. Btty can manage bettyd nodes, run scaled mapReduce operations on large data sets, automatically update your searches in real time, bulk import data into your local nodes, whatever you need!

Btty renders data before presentation. Since all betty requests are one-for-one document exchanges, transclusions and federated requests are handled on the client side by btty. Btty also sorts and paginates results, as bettyd prefers to do the fileserving and lets clients do the data processing.

Httbd, the dynamic XHTML gateway

Build a cross-platform, cross-device app with httbd, which provides a high-level interface to btty. Bring your favorite progressive web app stack to handle JSON directly via a simple interface.