Zh-cn/Internet的工作原理

From Web Education Community Group
Jump to: navigation, search

简介

有时候,你有机会观察齿轮和风扇皮带背后的工作原理。今天就是你的幸运日,因为本文将带你走进幕后去探索时下最热门的技术之一,你可能已经很熟悉这项技术了,它就是万维网(World Wide Web)。主题音乐响起。

本文将涵盖赋予万维网强大力量的那些底层技术:

  • 超文本标记语言(Hypertext Markup Language,HTML)
  • 超文本传输协议(Hypertext Transfer Protocol,HTTP)
  • 域名解析系统(Domain Name System,DNS)
  • Web服务器和Web浏览器
  • 静态和动态内容

尽管本文的大多数内容没法帮你建设一个更好的网站,但它可以让你用更专业(proper)的语言去向客户和其他人介绍Web。正如音乐之声中那位修女出身的聪明保姆所说:“我们从ABC开始阅读,我们从”哆来咪“开始歌唱。” 在本文中,我将向你简要地介绍计算机是如何彼此通信的,然后带你了解不同的语言怎样相互配合以创建构成Web的网页。

计算机如何通过Internet通信?

幸运的是,我们已经让计算机间的通信变得简单。在万维网上,大多数页面都是用同一种语言——HTML写成的,然后经由一个通用协议——HTTP在网络上传输。HTTP是Internet上的通用语言(方言,或者说是规范),举个例子,它允许一台运行在Windows上的计算机和一台运行在最新版本Linux之上的计算机协同奏乐(哆来咪)。通过Web浏览器这种特殊软件解释HTTP,并将HTML解析成可供人类阅读的形式,你就可以随时随地(包括使用电话、PDA,甚至是游戏机)阅读在任意类型的计算机上用HTML创建的页面。

Even though they’re speaking the same language, the various devices accessing the web need to have some rules in place to be able to talk to one another — it’s like learning to raise your hand to ask a question in class. HTTP lays out these ground rules for the Internet. Because of HTTP, a client machine (like your computer) knows that it has to be the one to initiate a request for a web page; it sends this request to a server. A server is a computer where web sites reside — when you type a web address into your browser, a server receives your request, finds the web page you want, and sends it back to your computer to be displayed in your web browser.

剖析一个请求/响应循环

Now that we’ve looked at all the parts that allow computers to communicate across the Internet, let's look at the HTTP request/response cycle in more detail. There are some numbered steps below for you to work along with, so I can demonstrate some of the concepts to you more effectively.

  1. Every request/response starts by typing a URL (commonly known as a web address) into the address bar of your web browser, something like http://www.apple.com. Open a browser now, and type this URL and press Enter/Return (or follow the above link) to go to the Apple homepage. Now, one thing you may not know is that web browsers actually don’t use URLs to request web sites from servers; they use Internet Protocol or IP addresses (which function like phone numbers or postal addresses, but identify servers, rather than phones or addresses). For example, the IP address of http://www.apple.com is 17.149.160.10.
  2. Try opening a new browser tab or window, typing http://17.149.160.10 into the address bar and hitting enter — you will get the same web page that you got to in step 1. http://www.apple.com is basically acting as an alias for http://17.149.160.10/, but why, and how? This is because people are better at remembering words than long strings of numbers. The system that makes this work is called DNS, which is a comprehensive automatic directory of all of the machines connected to the Internet. When you punch http://www.apple.com into your address bar and hit enter, that address is sent off to a name server that tries to associate it to its IP address. There are a literally millions of machines connected to the Internet, and not every DNS server has a listing for every machine online, so there’s a system in place where your request will be referred on to another name server to fulfill your request, if the first one doesn't have the right information. So the DNS system looks up the Apple web site, finds that it is located at 17.149.160.10, and sends this IP address back to your web browser. Your machine then sends a request to the machine at the IP address specified and waits to get a response back. If all goes well, the server sends a short message back to the client with a message saying that everything is okay (see Figure 1,) followed by the web page itself. This type of message is contained in an HTTP header.

successful request response cycle

Figure 1: In this case, everything is fine, and the server returns the correct web page. If something goes wrong, for example you typed the URL incorrectly, you’ll get an HTTP error returned to your web browser instead — the infamous 404 “page not found” error is the most common example you’ll come across.

  1. Try typing in http://www.joniscool.co.uk/jonlane/. The page doesn’t exist, so you’ll get a 404 error returned. Try it with a few different fake page addresses and you’ll see a variety of different pages returned. This is because some web developers have just left the web server to return their default error pages, and others have coded custom error pages to appear when a non-existent page is returned. This is an advanced technique that won’t be covered in this course, but Stuart Colville provides a good article on it at Adding meaning to your HTTP error pages!. Lastly, a note about URLs — usually the first URL you go to on a site doesn’t have an actual file name at the end of it (eg http://www.mysite.com/), and then subsequent pages sometimes do and sometimes don’t. You are always accessing actual files, but sometimes the web developer has set up the web server to not display the file names in the URL — this often makes for neater, easier to remember URLs, which leads to a better experience for the user of your web site. We’ll not cover how to do this in this course, as again, it is quite advanced; we cover uploading files to a server and file/folder directory structures in Getting your content online, by Craig Grannell.

各种类型的内容

Now you'll look at the different types of content you’ll expect to see on the Internet. They are grouped these into 4 types — plain text, web standards, server-side languages, and formats requiring other applications or plugins.

纯文本

In the really early days of the Internet, before any web standards or plugins came along, the Internet was mainly just images and plain text — files with an extension of .txt or similar. When a plain text file is encountered on the Internet, the browser will just display it as is, without any processing involved. You often still get plain text files on university sites.

Web标准

The basic building blocks of the World Wide Web are the three most commonly-used web standards — HTML, CSS and JavaScript.

Hypertext Markup Language is actually a pretty good name as far as communicating it’s purpose. HTML is what’s used to divide up a document, specify its contents and structure, and define the meaning of each part (headings, paragraphs, bulleted lists, etc.) It uses elements to identify the different components of a page.

Cascading Style Sheets give you complete control over how an element is styled and positioned. It’s easy, using style declarations, to change all paragraphs to be double-spaced (line-height: 2em;), or to make all second-level headings green (color: green;). There are a ton of advantages to separating the structure from the style, and we’ll look at this in more detail [in the next article]. To demonstrate the power of HTML and CSS used together, Figure 2 shows some plain HTML on the left, with no formatting added to it at all, while on the right you can see exactly the same HTML with some CSS styles applied to it.

successful request response cycle

Figure 2: Plain HTML on the left, HTML with CSS applied to it on the right.

Finally, JavaScript provides dynamic functions to your web site/application. You can write programs in JavaScript that will run in the web browser, requiring no special software to be installed. JavaScript allows you to add powerful interactivity and dynamic features to your web site, but it has its limitations, which brings us to server-side programming languages, and dynamic web pages.

服务器端语言

Sometimes, when browsing the Internet, you’ll come across web pages that don’t have an .html extension—they might have a .php, .asp, .aspx, .jsp, or some other strange extension. These are all examples of server-side web technologies, which can be used to create web pages with sections that change depending on variable values given to the page on the server, before the page is sent to the web browser to be displayed. For example, a movie listings page could pull movie information from a database, and display different movie information for different days, weeks or months. We’ll cover these types of web pages further in the Static versus Dynamic pages section below.

请求其他应用程序或插件的格式

Because web browsers are only equipped to interpret and display certain technologies like web standards, if you’ve requested a URL that points to a file format the browser isn't able to interpret, or a web page containing a technology requiring plugins, it will either be downloaded to your computer or opened using a plugin if the browser has it installed. For example:

  1. If you encounter a Word document, Excel file, PDF, compressed file (ZIP, or RAR), complex image file such as a Photoshop PSD, or another file that the browser doesn’t understand, the browser will usually ask you if you want to download or open the file. Both of these usually have similar results, except that the latter will cause the file to be downloaded and then opened by an application that does understand it, if one is installed.
  2. If you encounter a page containing a Flash movie, Java Applet, or music of video file that it doesn't understand, the browser will play it using an installed plugin, if one has been installed. If not, you will usually be given a link to install the required plugin, or the file will download and look for a desktop application to run it.

Of course, there are some gray areas—for example some browsers will come with some plugins pre-installed, so you may not be aware that content is being displayed via a plugin and not natively within the browser.

静态vs.动态网站

So what are static and dynamic web sites, and what is the difference between the two? Similar to a box of chocolates, it’s all in the filling:

A static web site is a web site where the content (eg the HTML and graphic content) is always static—it is served up to any visitor the same, unless the person who created the web site decides to manually change the copy of it on the server—this is exactly what we’ve been looking at throughout most of this article.

On a dynamic web site on the other hand the content on the server is the same, but instead of just being HTML, it also contains dynamic code, which may display different data depending on information such as the time of day, the user who is logged in, the date, the search term it has been given to look for. Let’s look at an example — navigate to www.amazon.com in your web browser, and search for 5 different products. Amazon hasn’t sent you 5 different pages; it has sent you the same page 5 times, but with different dynamic information filled in each time. This different information is kept in a database, which pulls up the relevant information when requested, and gives it to the web server to insert into the dynamic page.

Another thing to note is that special software must be installed on the server to create a dynamic web site. Whereas normal static HTML files are saved with a file extension of .html and can just be run by the browser with no extra help, these files contain special dynamic code in addition to HTML, and are saved with special file extensions to tell the web server that they need extra processing before they are sent to the client (such as having the data inserted from the database). PHP files for example usually have a .php file extension.

There are many dynamic languages to choose from — I’ve already mentioned PHP, and other examples include Python, Ruby on Rails, ASP.NET and Coldfusion. In the end, all of these languages have pretty much the same capabilities, like talking to databases, validating information entered into forms, etc., but they do things slightly differently, and have some advantages and disadvantages. It all boils down to what suits you best.

We won’t be covering dynamic languages any further in this course, but I have provided a list of resources here in case you want to go and read up on them:

总结

That’s it for the behind-the-scenes tour of how the Internet works. This article really just scratches the surface of a lot of the topics covered, but it is useful as it puts them all in perspective, showing how they all relate and work together. There is still a lot left to learn about the actual language syntax that makes up HTML, CSS and JavaScript, and this is where we’ll go to next — the next article focuses on the HTML, CSS and JavaScript “web standards” model of web development, and takes a look at web page code.

练习

  • Provide a brief definition for HTML and HTTP and explain the difference between the two.
  • Explain the function of a web browser.
  • Have a look around the Internet for about 5–10 minutes and try to find some different types of content—plain text, images, HTML, dynamic pages such as PHP and .NET (.aspx) pages, PDFs, word documents, Flash movies etc. Access some of these and have a think about how your computer displays them to you.
  • What is the difference between a static page and a dynamic page?
  • Find a list of HTTP error codes, list 5 of them, and explain what each one means.

Note: This material was originally published as part of the Opera Web Standards Curriculum, available as 3: How does the Internet work?, written by Jon Lane. Like the original, it is published under the Creative Commons Attribution, Non Commercial - Share Alike 2.5 license.