とりぷるぷぅ技術メモ: 1月 2015

[Perl]cloc ソースコードカウント

ダウンロード
cloc
実行例

$ perl cloc-1.60.pl ./hello/
      32 text files.
      30 unique files.
      18 files ignored.

http://cloc.sourceforge.net v 1.60  T=1.00 s (15.0 files/s, 9409.0 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Bourne Shell                     5            774            915           5784
make                             4            110             41            849
m4                               2             96             14            781
C                                2              4              1             16
C/C++ Header                     2              7              9              8
-------------------------------------------------------------------------------
SUM:                            15            991            980           7438
-------------------------------------------------------------------------------

Makefile等を除く

$ perl cloc-1.60.pl --exclude-lang=make,m4,"Bourne Shell" ./hello/
      32 text files.
      30 unique files.
      29 files ignored.

http://cloc.sourceforge.net v 1.60  T=0.50 s (8.0 files/s, 90.0 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
C                                2              4              1             16
C/C++ Header                     2              7              9              8
-------------------------------------------------------------------------------
SUM:                             4             11             10             24

[Python][XML] lxml

インストール
- Windows
  1. Pyhon Package Index: lxml 2.3 から Python 2.7 用パッケージ (lxml-2.3.win32-py2.7.exe) をダウンロードする
  2. ダウンロードしたファイルを実行する

動作確認

# -*- coding:utf-8 -*-
import lxml.etree

tree = lxml.etree.parse('test01.xml')
root = tree.getroot()
print root.tag
foods = root.iterfind('food')
print foods

for food in foods:
    print food
    children = food.getchildren()
    for c in children:
        print c.tag

    name = food.find('name')
    print "tag: ", name.tag
    print "text: ", name.text

実行結果

> python test03.py
root.tag: breakfast_menu
foods: 
food: 
child.tag: name
child.tag: price
child.tag: description
child.tag: calories
tag: name
text: Belgian Waffles
food: 
child.tag: name
child.tag: price
child.tag: description
child.tag: calories
tag: name
text: Strawberry Belgian Waffles
food: 
child.tag: name
child.tag: price
child.tag: description
child.tag: calories
tag: name
text: Berry-Berry Belgian Waffles
food: 
child.tag: name
child.tag: price
child.tag: description
child.tag: calories
tag: name
text: French Toast
food: 
child.tag: name
child.tag: price
child.tag: description
child.tag: calories
tag: name
text: Homestyle Breakfast

[Python][XML] minidom - Node のデータ表示

各 Node のデータを順番に表示する
test02.py

# -*- coding:utf-8 -*-
import xml.dom.minidom

# Node のデータを表示する
def dispNodeData(node, tag):
l = node.getElementsByTagName(tag)
for n in l:
    print n.nodeName, " - ", n.childNodes.item(0).nodeValue

# Main function
dom = xml.dom.minidom.parse('test01.xml')
foods = dom.getElementsByTagName('food')
for food in foods:
    print "nodeName: ", food.nodeName
    dispNodeData(food, 'name')
    dispNodeData(food, 'price')
    dispNodeData(food, 'calories')

実行結果

> python test02.py
nodeName: food
name - Belgian Waffles
price - $5.95
calories - 650
nodeName: food
name - Strawberry Belgian Waffles
price - $7.95
calories - 900
nodeName: food
name - Berry-Berry Belgian Waffles
price - $8.95
calories - 900
nodeName: food
name - French Toast
price - $4.50
calories - 600
nodeName: food
name - Homestyle Breakfast
price - $6.95
calories - 950

[Python][XML] xml.dom.minidom

XML を解析するには minidom を使う

test01.xml

  <?xml version="1.0" encoding="utf-8"?>
  <breakfast_menu>
    <food id="0">
      <name>Belgian Waffles</name>
      <price>$5.95</price>
      <description>two of our famous Belgian Waffles with plenty of real maple syrup</description>
      <calories>650</calories>
    </food>
    <food id="1">
      <name>Strawberry Belgian Waffles</name>
      <price>$7.95</price>
      <description>light Belgian waffles covered with strawberries and whipped cream</description>
      <calories>900</calories>
    </food>
    <food id="2">
      <name>Berry-Berry Belgian Waffles</name>
      <price>$8.95</price>
      <description>light Belgian waffles covered with an assortment of fresh berries and whipped cream</description>
      <calories>900</calories>
    </food>
    <food id="3">
      <name>French Toast</name>
      <price>$4.50</price>
      <description>thick slices made from our homemade sourdough bread</description>
      <calories>600</calories>
    </food>
    <food id="4">
      <name>Homestyle Breakfast</name>
      <price>$6.95</price>
      <description>two eggs, bacon or sausage, toast, and our ever-popular hash browns</description>
      <calories>950</calories>
    </food>
  </breakfast_menu>

test01.py

# -*- coding:utf-8 -*-
import xml.dom.minidom
import pprint

dom = xml.dom.minidom.parse('test01.xml')
pprint.pprint(dom)

# NodeList
foods = dom.getElementsByTagName('food')
print "NodeList(food)"
pprint.pprint(foods)
print "length: %s" % (foods.length)

# Node
food = foods.item(0)
pprint.pprint(food)
print "Node(food)"
print "nodeType: ", food.nodeType
print "nodeName: ", food.nodeName
print "hasChildNodes(): ", food.hasChildNodes()
print "hasAttributes(): ", food.hasAttributes()

# NodeList
names = food.getElementsByTagName('name')
print "NodeList(name)"
pprint.pprint(names)
print "length: ", names.length
name = names.item(0)
print "Node(name)"
pprint.pprint(names)
print "nodeType: ", name.nodeType
print "nodeName: ", name.nodeName
print "nodeValue: ", name.nodeValue
print "hasChildNodes(): ", name.hasChildNodes()
print "hasAttributes(): ", name.hasAttributes()

# NodeList
nameDatas = name.childNodes
print "NodeList(data)"
pprint.pprint(nameDatas)
print "length: ", nameDatas.length

# Node
data = nameDatas.item(0)
print "Text Node(data)"
pprint.pprint(data)
print "nodeType: ", data.nodeType
print "nodeName: ", data.nodeName
print "nodeValue: ", data.nodeValue
print "data: ", data.data

# nodeType 定数
print "nodeType constants"
node = xml.dom.Node
print " ELEMENT_NODE : ", node.ELEMENT_NODE
print " ATTRIBUTE_NODE : ", node.ATTRIBUTE_NODE
print " TEXT_NODE : ", node.TEXT_NODE
print " CDATA_SECTION_NODE : ", node.CDATA_SECTION_NODE
print " ENTITY_NODE : ", node.ENTITY_NODE
print " PROCESSING_INSTRUCTION_NODE: ", node.PROCESSING_INSTRUCTION_NODE
print " COMMENT_NODE : ", node.COMMENT_NODE
print " DOCUMENT_NODE : ", node.DOCUMENT_NODE
print " DOCUMENT_TYPE_NODE : ", node.DOCUMENT_TYPE_NODE
print " NOTATION_NODE : ", node.NOTATION_NODE

実行結果

> python test01.py
<xml.dom.minidom.Document instance at 0x00B402D8>
NodeList(food)
[<DOM Element: food at 0xb45418>,
<DOM Element: food at 0xb457d8>,
<DOM Element: food at 0xb45b98>,
<DOM Element: food at 0xb45f58>,
<DOM Element: food at 0xb4b350>]
length: 5
<DOM Element: food at 0xb45418>
Node(food)
nodeType: 1
nodeName: food
hasChildNodes(): True
hasAttributes(): True
NodeList(name)
[<DOM Element: name at 0xb45530>]
length: 1
Node(name)
[<DOM Element: name at 0xb45530>]
nodeType: 1
nodeName: name
nodeValue: None
hasChildNodes(): True
hasAttributes(): False
NodeList(data)
[<DOM Text node "u'Belgian Wa'...">]
length: 1
Text Node(data)
<DOM Text node "u'Belgian Wa'...">
nodeType: 3
nodeName: #text
nodeValue: Belgian Waffles
data: Belgian Waffles
nodeType constants
ELEMENT_NODE : 1
ATTRIBUTE_NODE : 2
TEXT_NODE : 3
CDATA_SECTION_NODE : 4
ENTITY_NODE : 6
PROCESSING_INSTRUCTION_NODE: 7
COMMENT_NODE : 8
DOCUMENT_NODE : 9
DOCUMENT_TYPE_NODE : 10
NOTATION_NODE : 12

[awk]指定した行の値を加算する

test04-3.awk

BEGIN {
 # 変数の初期化
 FS=","
 dog_sum1 = 0
 dog_sum2 = 0
 cat_sum1 = 0
 cat_sum2 = 0
}

{
 if ($2 == "dog") {
  dog_sum1 += $3
  dog_sum2 += $4
 }
 else if ($2 == "cat") {
  cat_sum1 += $3
  cat_sum2 += $4
 }
}

END {
 # 結果を表示する
 print "category,sum1,sum2"
 print "dog,"dog_sum1","dog_sum2
 print "cat,"cat_sum1","cat_sum2
}

test.csv

This is test.

1,dog,10,20
2,cat,500,200
3,dog,40,20
4,cat,10,500

実行結果

> awk -f test04-3.awk test.csv
category,sum1,sum2
dog,50,40
cat,510,700

[awk]行の先頭に文字列を追加

正規表現で先頭が数字の行を取り出し、その行の先頭に文字列を追加する test04-2.awk

BEGIN {
}

# 先頭が数字の行に文字列を追加する
{
 if (match($1, /^[0-9]/)) {
  # 先頭が数字の行を取り出す
  print "hoge,"$0
 }
}

END {

}

test.csv

This is test.

1,dog,10,20
2,cat,500,200
3,dog,40,20
4,cat,10,500

実行

> awk -f test04-2.awk test.csv > out.csv

出力結果

hoge,1,dog,10,20
hoge,2,cat,500,200
hoge,3,dog,40,20
hoge,4,cat,10,500

[awk]正規表現

match() を使うことで正規表現を使用できる。 test04.awk

BEGIN {
}

# 先頭が数字の行だけを表示する
{
 # if ($1 ~ /^[0-9]/) {
 # これを使うと buffer: と表示される
 if (match($1, /^[0-9]/)) {
  print $0
 }
}

END {

}

test.csv

This is test.

1,dog,10,20
2,cat,500,200
3,dog,40,20
4,cat,10,500

実行結果

> awk -f test04.awk test.csv
1,dog,10,20
2,cat,500,200
3,dog,40,20
4,cat,10,500

match() ではなく ($1 ~ /^[0-9]) を使った場合、buffer と表示される。

> awk -f test04.awk test.csv
buffer: T
buffer: 1
1,dog,10,20
buffer: 2
2,cat,500,200
buffer: 3
3,dog,40,20
buffer: 4
4,cat,10,500

2015年1月15日木曜日