2014年11月5日 星期三

Regular Expression - BaiNote

Regular Expression
正規       表示,語法            =正規表示法

文俊提供:
http://regexone.com/

Lesson 1: An Introduction and the ABCs

Regular expressions are extremely useful in extracting information from text such as code, log files, spreadsheets, or even documents. And while there is a lot of theory behind formal languages, these sets of lessons and examples will explore the more practical uses of regular expressions so that you can use them as quickly as possible.

The first thing to recognize when using regular expressions is that everything is essentially a character, and we are writing patterns to match a specific sequence of characters (also known as a string). Ascii, or latin, letters include those on your keyboard as a subset, and unicode characters can be used to match foreign text. They include digits and punctuation and all the special characters like %#$@!.

Below are a couple similar strings, notice how the text changes to highlight matches as you type. A pattern that matches all the strings below may be as simple as the exact letters that are common in each string. If you enter a pattern that does not match all three strings, you will be notified in the results column.

lesson1 . 初步介紹Regular expressions 正規表示法
正規表示法對於擷取網路文字、日誌、電子資料表、任何的文件都是非常有用的.
雖然有各式的語言與格式但,接下來的課程和例子更能探討正規表示法,可以用在哪些地方和用途這樣就能更快的對它上手.
剛開始認識正規表示法都是從一個字開始下手,我們作出一個樣板來比對字元(通常都是說字串),它可以是ascii unicode或是各種符號.
下面有三行字串,他們全都必須通過你的樣板,如果你的樣板無法統過下列判斷,將會出現錯誤提示!

answer:abc
--------------------------------------------------------------------------------------------------------------


Lesson 1½: The 123s

Characters include normal letters, but digits as well. In fact, numbers 0-9 are also just characters and if you look at an ascii table, they are listed sequentially.

Over the various lessons, you will be introduced to a number of special characters used in regular expressions that can be used to match a specific type of character. In this case, the character \d can be used in place of any digit from 0 to 9. The slash that comes first distinguishes it from the simple d character.

Below are a few more similar strings containing digits. Try and write a pattern that matches all the digits in the strings below, and notice how your pattern matches anywhere within the string, not just starting at the first character. We will learn how to control this in a later lesson.

Lesson 1½: The 123s

字串中包含了基本的字母,事實上數字也是一樣的,就像你看字母在ascii碼內,也是按照排序大小的
接下來會介紹數字的如何匹配,這會使用固定的特定格式,我們使用 \d ,來當作使用數字的匹配

下面有類似的字串包含了數字,常是寫出你的模板去比對這幾個字串,但請注意數字的批對不代表自串的第一個字!!

Lesson 2: The Dot

In some card games, the Joker is a wildcard and can represent any card in the deck. In regular expressions, you are often matching pieces of text that you don't know the exact contents of, other than the fact that they have some pattern (eg. are phone numbers for example).

Similarly, there is the concept of a wildcard, which is represented by the . (dot) metacharacter, and can match any single character (letter, digit, whitespace, everything). You may notice that this actually overrides the matching of the period, so in order to specifically match a period, you need to escape the dot by using a slash \. accordingly.

Below are a couple strings with varying types of characters, but of the same length. Try and use the dot to write a pattern that can match all the strings except the last entry (you are supposed to skip it). Notice how if you don't escape the dot to match the period in the last character of the first three strings, the dot will match the 1 in the last string (which you should not do).
Lesson 2 : 點
在一些紙牌遊戲,鬼牌可以當所有牌使用.在正規表示法中,有一些內容是你無法辨識的,即使他們還是有些模板
相同的,鬼牌在字元上代表是 " . "點,它可以批對任何的東西,包括字元、數字、空白或任何的,你可以會注意到如果只使用點會出現一些問題,所以必須要使用保留字元用的\
下列的不同型態的字串,但是相同的長度,試著使用點來作模板,除了最後一行,你可以跳過它,但是當你不時用保留字元就是/對於最後一行是沒有作用的.

Lesson 3: Matching specific characters

The dot metacharacter from the last lesson is pretty powerful, but sometimes too powerful. If we are matching phone numbers for example, we don't want to validate "(abc) def-ghij" as being a valid number!

There is a method for matching specific characters using regular expressions, by defining them inside square brackets. For example, the pattern [abc] will only match a single a, b, or c letter and nothing else.

Below are a couple strings, we only want to match the first three strings, but not the last three strings. Notice how we can't avoid matching the last three strings if we use the dot, but have to specifically define what letters to match using the notation above.



沒有留言:

張貼留言