Recipe 1.8. Processing a String One Character at a Time
2009-11-06 23:22
453 查看
If you're processing an ASCII document, then each byte corresponds to one character. Use String#each_byte to yield each byte of a string as a number, which you can turn into a one-character string:
1 'foobar'.each_byte { |x| puts "#{x} = #{x.chr}"}
2 #102 = f
3 #111 = o
4 #111 = o
5 #98 = b
6 #97 = a
7 #114 = r
Use String#scan to yield each character of a string as a new one-character string:
1 'foobar'.scan(/./) { |c| puts c}
2 #f
3 #o
4 #o
5 #b
6 #a
7 #r
Since a string is a sequence of bytes, you might think that the String#each method would iterate over the sequence, the way Array#each does. But String#each is actually used to split a string on a given record separator (by default, the newline):
1 "foo\nbar".each { |x| puts x }
2 #foo
3 #bar
String#each_byte is faster than String#scan, so if you're processing an ASCII file, you might want to use String#each_byte and convert to a string every number passed into the code block (as seen in the Solution).
If you have the $KCODE variable set correctly, then the scan technique will work on UTF-8 strings as well. This is the simplest way to sneak a notion of "character" into Ruby's byte-based strings.
1 french = "\xc3\xa7a va"
2
3 french.scan(/./) { |c| puts c }
4 #
5 #
6 # a
7 #
8 # v
9 # a
10
11
12 french.scan(/./u) { |c| puts c }
13 # ç
14 # a
15 #
16 # v
17 # a
18
19
20 $KCODE = 'u'
21 french.scan(/./) { |c| puts c }
22 # ç
23 # a
24 #
25 # v
26 # a
1 'foobar'.each_byte { |x| puts "#{x} = #{x.chr}"}
2 #102 = f
3 #111 = o
4 #111 = o
5 #98 = b
6 #97 = a
7 #114 = r
Use String#scan to yield each character of a string as a new one-character string:
1 'foobar'.scan(/./) { |c| puts c}
2 #f
3 #o
4 #o
5 #b
6 #a
7 #r
Since a string is a sequence of bytes, you might think that the String#each method would iterate over the sequence, the way Array#each does. But String#each is actually used to split a string on a given record separator (by default, the newline):
1 "foo\nbar".each { |x| puts x }
2 #foo
3 #bar
String#each_byte is faster than String#scan, so if you're processing an ASCII file, you might want to use String#each_byte and convert to a string every number passed into the code block (as seen in the Solution).
If you have the $KCODE variable set correctly, then the scan technique will work on UTF-8 strings as well. This is the simplest way to sneak a notion of "character" into Ruby's byte-based strings.
1 french = "\xc3\xa7a va"
2
3 french.scan(/./) { |c| puts c }
4 #
5 #
6 # a
7 #
8 # v
9 # a
10
11
12 french.scan(/./u) { |c| puts c }
13 # ç
14 # a
15 #
16 # v
17 # a
18
19
20 $KCODE = 'u'
21 french.scan(/./) { |c| puts c }
22 # ç
23 # a
24 #
25 # v
26 # a
相关文章推荐
- Recipe 1.1. Processing a String One Character at a Time(Python Cookbook)
- [Python]Processing a String One Character at a Time
- Recipe 1.9. Processing a String One Word at a Time
- JSON转换出现错误:net.sf.json.JSONException: Unterminated string at character 38 of
- Reason: Server is in single user mode. Only one administrator can connect at this time
- JSON经典异常:org.json.JSONException: Unterminated string at character
- 转:Move all SQL Server system databases at one time
- JSON经典异常:org.json.JSONException: Unterminated string at character
- 效率提升最重要的原则 - Doing one thing at a time
- uva 12726 One Friend at a Time bfs + 状态压缩
- Recipe 1.8. Checking Whether a String Contains a Set of Characters(Python Cookbook)
- Living one day at a time (update for a long time)
- [tip debugging]Attach VS debugger to more than one process at the same time
- Java - replace a character at a specific index in a string?
- 效率提升最重要的原则 - Doing one thing at a time
- UVa12726 one Friend at a Time (位 广搜)
- iOS使用XIB文件报错:reason: 'A view can only be associated with at most one view controller at a time!
- Freemarker Failed at: ${itm.creatTimeString?string("yyyy-MM... [in template
- vss 2005 提示 Only one database connection at a time is supported的处理
- net.sf.json.JSONException: Unterminated string at character 1801