By all means it would be quite possible to put the mesh together using nothing but Layer 2 switching and bridging (See the background section if these terms are unfamiliar). Nonetheless, there are some functionalities that would be highly desirable in a relatively complex network topology such as the mesh will probably become that can only be achieved with routers. Further, if the growth of the mesh is all that is hoped, then there would eventually be serious problems with a single subnet design (such as a switch and bridge only network would necessitate).
The features ought to have available include:
Only dynamic routing can allow and make use of redundant links. A router is able to make decisions about which link to use based on a set of configurable measures.
Once the redundant links exist, we need to be able to take advantage of them to the effect that if a link goes down, an alternative path around the failed node will be automatically found and used. That way, if a truck runs into somone's house and takes their node down (or, more likley, heavy rain obscures their line of sight), the mesh only notices a difference in bandwidth and an altered routing table.
Even where links do not actually go down, it would be nice if the routers we use could distribute the traffic load accross the available paths in proportion to the bandwidth available on each path. This is a much better scenario than all sharing the one 11Mbps pipe (which would block up as soon as someone tried to download, for instance, the latest DeadRat iso images :-).
Routers are able to report on what they are doing, and using them would make it easy to produce good statistics about network utilisation which would allow us to hilight areas of heavy traffic, for example, and plan acordingly. Also, it would allow us to locate and murder the aforementioned weenie with the iso images :-).
Using routers at a backbone level would allow people running Access Point/Server nodes to run pretty much whatever they like in terms of networking arrangments locally, and translate at the interface that talks to the nearest routing node. People could run IPv4 with DHCP, for instance, then use IPv4 to IPv6 bridging software to connect to the mesh.
If there is more than one subnet within the mesh (and frankly, a single subnet would be unscalable), then if we did not use routers, we would need to use Layer 3 switches with separate VLANs assigned per subnet. This could get very messy very fast, and would require adding a route for each of them to the global routing tables (not the act of a good neighbour). Routers, on the other hand, can aggregate routes to subnets that are part of the same larger network into a single route to advertise to the rest of the world. This is the Right Thing to do.
The implications of not using routers... [WIP]
In David Leonard's words:
- simplicity / standard / openness / availability of implementation - robustness / responsiveness - very conservative of bandwidth - works with ipv6 - provide accounting
We probably need to expand upon and further define these.
It is important to note that probably we will not get to make design decisions at a network backbone level for some time yet. Probably people who get links up over the next few months will use whatever they can get to work. It is only when we see enough links up that it is possible to start talking about a backbone, that a co-ordinated approach will become important.
[Some more points here.]
We are not going to get to make network design decisions in the early stages, even if we allow that someone can make such decisions. The mesh is going to be more evolved than designed. People are going to set up their nodes and talk with their neighbours using whatever technology they can get work. This is not a bad thing.
It is likely that small clusters of people will end up with mutual conectivity in the short term. These small area will again probably be using whatever they can get to work. While we are looking at using IPv6 over the mesh as a whole, I see enough IPv4 talk on the mailing list to suggest that these communities are probably going to be talking IPv4 among themselves. This is not a bad thing, either.
When these communities start to put up links to other local areas, then we are going to have some new options:
I think no-one wants 1 or 3. 1 would mean eveyone has to read up on routing, read the RFCs, etc, and since very few people will actually do that, 'mere anarchy is unloosed upon the mesh... the best lack all, conviction while the worst are full of passionate intensity'. Also, novice networkers running BGP could pollute the global routing tables and get us into serious trouble.
3 has merits despite being (probably) politically impossible. I think most people, when they talk about doing network design and about the scenario where we create our own routing protocol are really thinking in terms of 3. But I think it won't happen, and there is really no way to make it happen.
So probably we are looking at 2. What we will want as a precondition for 2 to work well is a hierarchical approach to address allocation, and I think this has been met pretty well by David and Robert.
The next step is to decide on a protocol. I am leaning towards recommending OSPFv3 for IPv6, since it seemsto do what we want and has the great virtue of already existing. But there needs to be consensus, and this document exists to provide enough of the background for people to make an informed decision.
The step to follow is this: a few people with the interest and the ability need to read the RFCs and other documents sufficiently thoroughly to become competent enough to debate and ultimately write a set of 'Recomended Guidleines' for local areas connecting to the mesh to follow. People running routing nodes would still have to configure their own boxes, but clear guidleines this will be much easier.
[ this section needs work, obviously...]
A number of people on the list have discussed the prospect of using DHCP at Server nodes. I think if this were done at this point we would be talking about doing it with IPv4, and running an IPv4 to IPv6 bridge at the nearest router.
This is an attempt to fill in by way of background those areas that it would be a good idea for every person running a rouing node to be familiar with.
The simplest computer network would consist of just two computers connected directly together. Connecting many computers together generally requires additional hardware, and in the case of the rather large internetwork with which we are all familiar, what might work well on a small scale becomes quite inadequate very quickly. The simplest way to start would be to describe the different types of network hardware in terms of how they fit together, but mostly in terms of what they do:
In doing so, I will make some references to the OSI Reference Model, so it's probably worthwhile having a quick look at that first.
Hubs are multiport devices that act on Layer 1 of the OSI Model only and know absolutely nothing about the higher levels. They take a signal from one port and send it to every other port (amplified to a nominal level). Hubs are often referred to as repeaters, because that is exactly what they are.
For all machines connected via a hub to talk to each other, they must all be part of the same subnet (unless some machine connected to the hub is acting as a router).
Bridges are able to act on layer 2 of the OSI Model and as such are able to keep track of the hardware MAC addesses of machines that talk to it. This means that a bridge connecting two network segements is able to keep traffic within those segements separate, but still pass traffic between them.
Switches are very much like bridges, except that they are able to create virtual circuits between their interfaces. Unlike a hub, which sends all traffic to all its interfaces, a switch keeps track of which MAC addreses talk to which interface, and is therefore able to send traffic just to the interface it needs to go to.
Like hubs and bridges, switches can only convey traffic to and from machines on the same subnet (though several subnets might be present on the network segment to which the switch is conected).
Switches can support some more advanced features (such as VLANs, where individual ports can convey traffic for different subnets to a router via the same uplink), but these are probably not relvevant to the mesh.
Routers are layer 3 devices that know about network protocols (like IP). A router can take a packet, see that its destination is an IP address in a particular network, and send it out via an interface that can reach that network.
For instance, a router receives a packet from a machine with the IP address 172.19.199.31 in the 172.19.192.0/18 subnet, whose destination is a machine with the IP address 192.168.69.42 in the 192.168.69.0/24 subnet. It might not know about that subnet directly, but either it was configured with a static route, or it has learned a route from other routers exchanging information via a routing protocol to the network 192.168.64.0/19. So it sends the packet out via that route, trusting the other routers on the way to locate the correct subnet.
In fact, the internet manages to function precisely because such higher order addresses are possible. This is the aim of the game when workng out IP addressing to match a network's topology.
IPv4 classful addressing, while it is still in use to a certain extent, is obsolete and should not be used. I will only refer to it in terms of classless addressing and Classless Inter-Domain Routing (CIDR).
CIDR style addressing was introduced in 1993 via RFCs 1517, 1518, 1519 and 1520 and was developed in response to the problems associated with classful addressing. It was rapidly and generally deployed in 1994-5 to deal with (what was felt to be) the imminent depletion of IPv4 address space. Since then it is more or less the standard addressing scheme for the internet.
Classless addressing supports arbitrary sized networks, unlike classful addressing which only supported networks of specific sizes (although improvements such as subnetting and VLSMs helprd, these can be seen as developments towards CIDR). A network is identified by its network address and a mask describing the number of bits in the network prefix. This mask is very much the same creature as a subnet mask under classful addressing, but now takes the place of the classful addressing altogether. While the term 'subnet mask' refers back to classful addressing, under CIDR, any network is really just a subnet of 0.0.0.0/0, the global address space. I will use the term 'subnet mask' to describe what is in effect the mask that defines any network (sub- or otherwise :-).
For example, an interface might be configured with the address 192.168.195.28 and a subnet mask 255.255.240.0 (ie, a 20 bit mask). This makes the address part of the 192.168.192.0/20 subnet, whose address range is 192.168.192.0 (the network address) to 192.168.207.255 (the broadcast address -- only addresses in between these are available for interface addressing). Now a host somwhere out there on the internet that wants to connect to 192.168.195.28 (this should not happen -- the 192.168.0.0/16 address space in reserved and not routeable), it will pass its packets on to a router, and that router needs to know a route to a network that includes the 192.168.192.0/20 subnet. If it knows about a route to 192.168.0.0/16, and no more specific route, it will pass the packet to the router that advertises that route (or is the destination of a static route to that network). The next router might know about a route to 192.168.128.0/17, and passes it on to the router that advertises that route, and so on until the target network is reached.
You will note that subnet masks as I have described them in the preceding paragraph can be represented in two ways. The first is the familiar 'dotted decimal' notation, where the four bytes that make up the address or mask are shown as four numbers from 0 to 255 separated by dots. A mask represented like this will always be the decimal version of a 32 digit binary number where there are all contiguous 1s from the left, and all contiguous 0s from the right. For example, 255.255.240.0 is the dotted decimal notation for the binary 11111111.11111111.11110000.00000000, which has 20 contiguous 1s starting from the left. This leads to the second representation, which has become common since CIDR because it is so much shorter :-) -- the mask can be combined with a host or network address as in 192.168.195.28/20. The /20 represents the number of contiguous bits (1s) in the mask. Applying a bitwise logical AND:
255.255.240.0 or 11111111.11111111.11110000.00000000 (the subnet mask) & 192.168.195.28 or 11000000.10101000.11000011.00011100 (the destination ip address) yields 192.168.192.0 or 11000000.10101000.11000000.00000000 (the network address) and the broadcast address is 192.168.207.255 or 11000000.10101000.11001111.11111111
It's called a mask, because the above operation 'masks out' the host portion, leaving only the network address. All hosts on the same subnet yield the same number (the network address) when ANDed with the subnet mask. If a host has the address 192.168.208.1, ANDed with the subnet mask yields 192.168.208.0, which is a different subnet.
The /20 simply says that the first twenty bits are the network part of the address, while the remaining 12 bits are the host interface part. The network address is the address where the host portion is all zeros, while the broadcast address is the address where the host portion is all ones.
That's the story with IPv4, anyway, and is what is in general use on the internet today. IPv6, as it happens, merely extends the same functionality and mechanisms. Classfulness is dead and buried. IPv6 uses 128bit addresses rather than the 32 bit addresses described above.
[ There should be more stuff about IPv6 here... ]
[ Needs to be fixed so the terminology is more consistent and there is less talk about it ]
History IP was first standardised in 1981. The original specification separated the 32 bit address into a Network-number (or Network Prefix) and a Host Number. The relative sizes of these depended on the address 'class'. Classful Addressing While sometimes you still hear subnets described as 'Class C size', for instance, or 'Class B size', the conventions of classful addressing are obsolete and should be avoided wherever at all possible. This section is for background information only. The three classes of interest are as follows: Class A networks Under classful addressing, if the first bit is set to zero (ie, the address is in the range 126.96.36.199 - 127.255.255.255 or aproximately 50% of the total address space) then there is an 8 bit network prefix and a 24 bit host number. There can be 2^( 8 - 1 = 7) = 128 class A networks, and 2^24 = ~16 million addresses per network. Class B networks If the first two bits are set to 10 (ie, the address is in the range 188.8.131.52 - 184.108.40.206) then the network prefix has 16 bits as does the host number. Class B addresses comprise approx 25% of the total address space. There can be 2^( 16 - 2 = 14) = ~16 thousand Class B networks, with ~65 thousand addresses per network. Class C networks Where the first 3 bits of the address are 110 (ie, the address is in the range 192.0.0.0 - 220.127.116.11) then the network prefix is 24 bits and the host number is 8 bits. There can be 2^( 24 - 3 = 21 ) = ~2 million class C addresses, with 2^8 = 256 addresses per network. Note that certain addresses are reserved, so the actual number of addresses available for use per the above is less. This also leaves aside Class D and E networks (determined by the first four bits being set to 1110 and 1111 respectively). These are not available for the purposes of assigning addresses. Don't think in terms of classful addressing -- it will bite you. IPv6 is based on classless addressing, as is IPv4 post CIDR.
The OSI (Open Systems Interconnection) Model is a standard seven layer description of the abstractions involved in network protocols. The layers are:
For the puposes of this document, only the lowest three layers are interesting. I am leaving the others blank for now.
There are two different ways to categorise routing protocols. One the one hand there is a distinction between Interior Gateway Protocols (IGPs) and Border Gateway Protocols (BGPs). Note that 'gateway' is a synonym for 'router'. IGPs are protocols used within an Autonomus System (AS). An AS is a group of routers acting together with shared responsibilities. BGPs act, as you'd guess, between ASs. In our case that would be between nodes who talk to the internet and the internet. For our purposes, it is simplest to think of the whole mesh as a single AS. This means that BGPs are not going to be of interest to us till we have mulitple connections to the internet and we want to advertise external routes. With current (2001-11-02) debate on the mesh mailing list, I think this will never happen.
On the gripping hand, there is a distinction among IGPs between Distance Vector and Link State routing protocols. In simple terms:
Distance Vector protocols calculate paths based on a routing table which includes all routes within the AS and which, whenever a change occurs, is rebuilt by the router noticing the change and tranferred to other routers every time this happens.
Distance Vector protocols include:
RIP (and when most people talk about RIP they are talking about RIP2), is the single most widely used routing protocol in existence. All the unix-like OSs and the NT family have built in support for RIP. In terms of availability, this rates the highest.
This is a proprietary Cisco protocol that is only available on Cisco routers [Check this!!!].
Link State protocols maintain an individual 'path cost' database on each router. Each router updates its own database whenever a link changes state, and the effected routers send out a Link State Advertisement to notify the others
Link State protocols include:
OSPF is described in RFCs 2328 (for the latest OSPFv2), and 2740 (for OSPFv3, which is for IPv6). OSPF is a links state routing protocol created specifically for complex IP networks.
http://www.freesoft.org/CIE/Topics/89.htm contains a brief summary, and http://www.juniper.net/techpubs/software/junos50/swconfig50-routing/html/ospf-overview.html another.
OSPF does the following:
OSPF uses areas and all routers by default is in 'area 0'. Usually, this is reserved for the backbone network, and separate areas conntected to it have hight numbers. Each area has its own DR, if required, and only propogates a summary of its topological database to area 0 routers.
OSPF supports authentication, which makes it possible to prevent routes from being hijacked by a rogue router.
OSPF supports Type of service (TOS) routing, by which different kinds of packet can be tagged and routed in different ways. For instance, game traffic could be sent over an ultra low-latency but low-bandwidth link, while file tranfers could be sent over a high bandwidth but higher latency link. This requires support from the application layer (OSI Layer 7).
Hybrid protocols use a combination of DV and LS techniques. They include:
Last modified: Sun Nov 4 08:19:10 2001