Arrangement of Functions

A comment on The Design Philosophy of the DARPA Internet Protocols (hereafter referred to as DP) and End-to-End Arguments in System Design (hereafter E2E). 这原本是学校作业。

Preface

In the world of software engineering, there are no silver bullets. (Fred Brooks)

The world of IT is characterized for its the high variability of needs, and the high complexity of the systems. As a consequence, no one is able to tell in detail what the final architecture of an IT system would be. Therefore, the only way to fulfill the needs is to iterate with small but steady steps, in the meanwhile being always ready for revision and changes.

Ontogeny recapitulates phylogeny. I believe that for students it is also vital that we take small iterations. I found it hard to anatomize a huge in-use architecture, but much easier to learn from the point when it was just born. Along the developmental timeline, we get to learn the changes in ideas and architectures, unveiling the concerns and trade-offs that are very hard to see or have become intricate in the final product.

The Internet walked a path of the above-mentioned incremental development. Its vitality was proliferated by the careful choices of techniques, and quick responses to newly-arising problems. During the process, great amount of human wisdom has been dedicated to the Internet.

Having read the thousands of words, I came up with a conclusion: Many designs of the Internet were about rational if not optimal arrangement of functions. Any complex system has to arrange a wide range of related functions. With no clever methods the functions will mingle together, resulting in exponential growth of complexity and coupling, making it impossible to change any a bit without crashing down another part. Therefore, in this comment I felt it meaningful to discuss the topic of arranging functions when designing a system.

I was pondering: What was the point in writing a less-brilliant article, when two greater papers had already presented their thoughts here? At first, I decided I was going to make a simple summary of the papers. Yet as I was reading, numerous thoughts arose, which constitutes most of the following contents. Those thoughts were tightly connected to my own learning experience. That was when I realized that everyone reads a paper with a set of knowledge and understanding that is unique to them. The purpose of this comment was not to be a substitute of the papers, but to provide a more comprehensive viewport, to augment the points with my own examples, hopefully helping others gain deeper understanding of the philosophy in software engineering.

Long Live Encapsulation

Keep It Simple, Stupid.

The users' needs are hard to understand

The world of software changes fast, faster than anyone could have thought. As a consequence, it is way too hard to make predictive planning before the implementation [UML Fowler]. The Internet was not designed for world-wide use, nor did it put speed or security at the first place. However, such are of the hottest issues in the domain of networks in recent years. That inspires us that we cannot expect to do all the requirement analysis once and for all.

At the same time, its [the Internet's] success has made clear that in certain situations, the priorities of the designers do not match the needs of the actual users. (DP)

The actual users' needs are hard to understand. This problem prevails not only in Web application development but also in lower levels. I do not want to discuss how to identify the needs. Instead, I want to discuss how we should understand the word encapsulation, so that our sights are not restricted by the method.

Encapsulation takes no restricted forms

The Internet arranges its functions in layers. Layering is one of the fundamental aspects of the Internet. Note that layering, although with little to do with Object-Oriented Design, can also be considered as a form of encapsulation.

The purpose of encapsulation is to reduce redundant work when updating, modifying or removing functions in the system. However, note that it is just a means, not an aim. It is intended to reduce unnecessary work, minimizing verbosity and leaving only the necessary parts to be done manually. I call it the Eternal Goal of All Programmers. I know this sounds too trivial, but it is the sole criterion that we can rely on to judge whether or not we are developing towards the right direction.

A common mistake is to take encapsulation as a concept unique to object-oriented programming. Having read about the Internet, we could know it was wrong. Its core is to reduce inter-module links (associations). Even a procedural approach could integrate closely related steps in procedures.

Back to the topic of the Internet. Encapsulation lives in the layer system of the Internet. The layers conform to the Single Responsibility Principle very well. By hiding potentially countless functions in a layer, and only exposing limited primitives to other layers, higher layers can presume that their needs are well met in lower layers, and need little concern of the other levels. This enables developing a huge system within rational time periods to become possible.

Total encapsulation is too difficult

Another common mistake is trying to encapsulate everything, like trying to hide awfully-barbered hair in a cap. In fact, components in a system are, from a philosophical perspective, all associated to a degree. There are often times when we find that an application layer function has to bypass the transportation layers and visit the specific IP protocol, or even the ports. How can we decide that such behavior is sinful?

The Internet is a good example. Although by concept upper layers should only use the primitives provided by the Network Layer, they are often tightly coupled with the IPv4 protocols. My teacher told us that it is part of the reasons why IPv6 met so much frustrations in gain ground. But anyway the IP protocol is still one of the cornerstones of the Internet, as packet switching, not circuit switching, is.

This does not mean that in special cases, e.g. out of performance concerns or else, breaking the layering is okay, and we do not need to take care of function arrangement any more. Even inconspicuous threads, when pulled, can bring a whole shirt down.

A nice approach may be to add a side system parallel to all layers. It is able to provide cross-layer visits, and provide global optimization. With such a side system, our functions in the system will be organized again, even though more coupling could be introduced. I shall discuss side systems in later chapters.

Encapsulation helps arranging functions dynamically

Good arrangement of functions vary dynamically according to different needs. As we shall see in the following chapters, the requirements of the Internet are often intricate, and sometimes conflicting. Therefore, we cannot hope for a one-for-all module to stay forever in the structure of the Internet. Quite to the contrary, it emphasizes on the ability to replace layer implementations with one another without affecting other modules, which is just what encapsulation aims at. Encapsulation, as a result, forms the fundament of all following discussions.

The Trinity: Requirements Can Be Contradictive

Can we build a computer that's both cheap and fast?

Can the Internet have high throughput, low latency and high reliability in the same time?

DP presented 8 goals to achieve for. I summarized them into a triangle: the Throughput-Latency-Reliability trinity. At a specific time point, resources available are limited. Sometimes we have to assign priorities to conflicting needs. Therefore, decisions must be made on which to prefer and which to subdue. According to DP, such decisions often affect the outcome greatly.

This confliction does not mean that generality is impossible. Since in different use cases, different dimensions are given different priorities, a good solution is to provide numeric building blocks with unified interfaces, each allocating different proportions of resources on the three vertices of the triangle. This way, we still confine functions within the right border, while achieving generality. Practical protocols and infrastructures in each layer of the Internet also take more factors into account, such as maintenance cost.

The V curve

Often, the overall performance is not monotonous to a specific factor. Rather, the graph is often an inverted-V curve. The performance is at the crux only when such a factor is neither too high nor too low.

Another case of the V curve is about hybrid. The basic idea is that methods and algorithms that are asymptotically low in complexity often take more time to execute when the data size is small, while those that fit small sets of data performs poorly facing large sets. So we conduct experiments to find out the intersection point, applying the fittest method in each individual interval. The individual methods form the V-curves, and we always take the most efficient part of them (consider the two curves \(y=x\) and \(y=x^2\) in \([0, 1)\) and \([1, +\infin]\)).

In the Internet, the efficiency problem of error-checking is a big issue. If we construct a pure end-to-end checking (i.e. not checking for mistakes in switches), then the cost would be exponentially increasing as the distance, the size of the packet and other disturbances increase. That cost is unacceptable. On the other side, if we choose to implement error-checking at each router and packet switch, then the delay would also be too large. A hybrid approach will result in a better outcome.

The same happens in many areas. Practical sorting algorithms also adopt hybrid approaches. In libc++, the sort function performs insertion sort for small sets of data, quick sort for common cases, and heap sort when quick sort deteriorates.

The common idea behind the approaches is which functions to place and where to place the functions, as a complement of the previous chapter, in which we discussed how to support easy updating, changing and removal of functions.

Side Systems: Mechanisms That Linger Around

In the previous two chapters I discussed two aspects of arranging functions, taking the Internet as examples. But the true wisdom lies in how the Internet architecture fuses the two parts together and balances the degree we adopt either.

Strict modularization entails weaker control

It is a common to see people over-using generalization. Consider the differences between TCP and UDP. Yeah, they are both communications protocols on the transportation layer, but are we confident to say that upper level applications need not worry whether to use TCP or UDP? I think not. They have distinctive characteristics. As I said, good modularization enables us to replace TCP with UDP without changing the other layers. But even if we succeeded in transplanting the unfit protocol, a big question mark must be placed on the usability and robustness of the outcome. This poses a new argument: Strict modularization entails weaker control.

I always believe that lower levels can hide details and provide simpler interfaces for easier use. However, it should also always enable the upper levels to dig into the bottom whenever necessary. Restricting such ability only results in more obscure work-arounds to bypass the man-made restriction. Consider how Java programmers optimize? It is too weird that you have to resort to other languages through JNI. In C# there are ref and stack variables, granting more control to the programmers. This is only a small example, meant to exhibit that finer-grained control is always possible to be needed in the upper levels.

Side systems to bypass layering or modularizing

It is common to see such mechanisms in Java and C#. One notoriously famous one is reflection. In the Internet architecture there is also one such phantom loitering.

Consider error detecting and recovering in the Internet.

We know that there are chances bits are inadvertently inverted. E2E poses such a question: should we check the checksum at each node in the Networks? E2E preferred a higher level approach. While the error is essentially a physical error and is very low-level, the cost of fixing it in the same level is too high to be acceptable. Furthermore, other errors may occur during the transmission, some of which are of higher levels and the link layer has no idea how to fix them.

E2E pointed out a strong observation: Even if we implement error-processing systems in the lower layers, we still have to do that in higher layers. The paper considered that an obvious pattern when errors in lower levels should be propagated upwards and we should not attempt to strangle them there (otherwise we often get higher cost, with no good). Another aspect, optimization, also supports handling errors in higher layers, where more information can be analyzed, allowing for the potential of global optimization.

I wanted to go one step further. It is clear that an error-handling system couples with many layers in the Internet, by nature. Hence, one easy approach is to extract that system into a side system, being an onlooker of the busy Internet and only be woken when errors were detected, like daemons. But of course, programmers hate coupling. We can adopt some cleverly-conceived design patterns to embed this system naturally. For example, using the Chain of Responsibility Pattern, passing the error like swimming bubbles until it finds some proper handlers that can solve it. I personally disagree with the latter approach because it violates the Single Responsibility Principle and scatters a common theme, error handling, in who-knows-where corners.

I argue that some systems are by nature in association with multiple other systems. I discourage attempts to force them into other systems, but I advocate patterns that add mediators (not the Mediator Pattern) to reduce multi-to-multi coupling, such as the Façade Pattern, and the Observer Pattern. My opinion (which is likely to change during my learning) is that we should not be afraid to add side systems, as long as it conforms to our most fundamental goal: arrange functions well so that the system is easily changed and easily understood.

Conclusion: The Eternal Goal of All Programmers

In the second chapter I mentioned the Eternal Goal of All Programmers. It seems a little far from the topic of the Internet, but I think it coheres well with our major. After all, the Internet is something closely related to software engineering. In this chapter, I want to talk about my thoughts on practical designs, while revising the topics discussed above.

The Eternal Goal of All Programmers: Minimize unnecessary labor, doing only creative work.

This goal can be approached by means of automating regular processes, reuse existing code base, reuse former designs, explore more patterns of intelligence and so on. I don't have concrete theoretical base but I perceptually consider it rational. After all, computers were exactly born to release humans from repetitive, non-creational work. In my opinion, this eternal goal is what guides us in virtually every aspect of software designing.

Afternote: Why Learn The Low-Level Part of The Internet?

I have just introduced the eternal goal of all programmers. If that is the case, why would we still learn the basis of the Internet, when it has already been developed and still maintained by the pioneers? Why do we not just use the Internet without concerns of its history or implementation details?

The answer lies in the methodology itself. It turns out that even creational work requires background knowledge, which is so general and time-tested that it applies to not only the Internet but anything related to software engineering. Moreover, some other industries are also absorbing nourishment from software engineering. For example, SpaceX has adopted an agile development style, doing a bunch of tests and prototypes in very short cycles, and experiment incrementally what new technologies can be applied on the spaceships. The construction industry has also endeavored to perform modularization, to reduce mold cost and boost building tempo. I was not reading the paper to learn how to re-implement another internet, but to learn exactly how the pioneers came up with the ideas. That is, their mindset.

This focus on the ideas but not the outcome is also the reason why I did not spend much words on describing what the Internet was like in structure. In the 2nd and 3rd chapter, I discussed two aspects that needs thorough consideration when designing multiplexing systems (with many functions). In the 4th chapter, I stressed on where to place the border of generalization and specialization, and when to break the rules. Apparently that concrete methods are numerous. Many of them are even to the contrary. It is the fundamental goal that forms the ruler and compass in our mind. The Internet is a successful and widely-used engineering architecture. Learning it gives us the closest possible perception of decision making and plan making as students.

Task 2: The Future of the Internet: The Universal Bus of A Computing Network

By requirement, this question is a different topic from task 1. However, by concept, my prediction of the Internet is also rooted in this title: Arrangement of Functions. I think we can find answers to task 2 in this way. Indeed, it takes both courage and imagination for an undergraduate to predict the future of the Internet. Yet I do have the confidence that the development will go as I conceive.

Modern development, no matter in what discipline, tends by nature to adopt the most profitable path, not just any profitable path. If the latter was the case, then hardware giants like Intel, IBM and so on would not have to keep compatibility when introducing new technologies, and outdated code bases would not need to be maintained even nowadays. The list of imaginary examples could stretch to no end. The problem is, in practice we often choose the methods that grants us greater cost-efficiency. And down to the core, this is basically a question of where to arrange the functions.

In the earlier years, when Moore's Law was still in its youth, improvement in computing power was rather easy. The Internet then could be used for interchanging messages, which may require low latency but tolerates low throughput, and transmitting non-instant data such as files, which could happily put up with latency, as long as the throughput is high. However the usage was, since closer functions can be better communicated among one another and more easily optimized, computing power was seldom separated. Compared with the uncertainty of the Internet, it seemed very clear that arranging the functions as integrated as possible (that is, buy a new CPU instead of two Internet-connected computers) would be the most cost-efficient idea.

In recent years, the dash of single-unit computing power is coming to an end. Calls for multi-core processors and even distributed calculation have arisen. Yet quantity itself is far from enough. Distributing finer-grain computing power along the Internet will be the most important topic for a long time, as it is now the most cost-effective way to improve the overall computing power. That is, we will arrange the functions in a more isolated (encapsulated) and specialized fashion, and the question of what should be in a module and what should not will rise more often. In order to support better distribution, first, the Internet has to adapt to a new role: a fundamental component of a large computing system. By analogy, the Internet will be the universal bus of a giant, cell-based computing network (not computer network).

Why must computers be distributed? When computing power is so cheap to improve, it seemed sensible to integrate numerous varied calculation and logic units onto one chip - the CPU. But as the cake is getting reluctant to grow bigger, more focus should be turned onto how to divide the power. As I see, the future belongs to special-purpose computing units. The mainstream GPUs have already walked their first steps, presenting programmable mechanisms. if chips can do so, why not computers themselves? Different chips or computers can perform different but specialized calculations, targeting at only one type of task. Such computers can be interconnected by the Internet (perhaps through some mediate networks) and provide shared functionality in the whole system. The Internet itself will then be the highway for not only traditional information, but also calculation requests and semi-finished calculation results. A single computer, restricted by its size and locality, can surmount the weakness by distributing its power across the Network, with the highest cost-efficiency. The resulting Internet will be dotted with data centers, AI centers, graphics centers, physical centers and so forth. The users, now interacting with "intact" terminals that utilize the Internet, will instead be interacting with "entries" that are themselves the façade of the Internet.

I am not an expert, but here are two points I regard as significant.

  • Pseudo-local network. Although there exists this thing called VPN, their purposes differ. In a pseudo-local network, the end systems (or the network itself) should be aware of the network components. The purpose is to allocate resources to users optimally. For example, when a gamer connects into the Internet with a façade, the network (or a conceptual "network resource monitor") will try to search for and choose best available computing units and assemble then in an imaginary pseudo-local network, providing the illusion that the gamer has access to an intact computer.
  • Primarily improved reliability. The Internet should fault less often, using some healthier proxies and protocols or else. Since much more data, even semi-finished calculation results will be traveling on the Internet, it becomes vital that the Internet provide correctness that is comparable to a PCI-e.

I know it is hard to achieve the goals. Perhaps we even have other walk-around approaches. However, I do anticipate that the generalization of the concept of a computer is going to happen. If that is the case, the Internet should be getting ready for the new challenge.

References

  • UML Fowler: UML Distilled 3rd Edition, Martin Fowler, Ch2
posted @ 2020-09-22 23:45  seideun  阅读(132)  评论(0)    收藏  举报