Project Stage 1 - `-fdump-rtl-all`
Intro
In the last post, we dicussed `-fdump-tree-all` option, and the files generated by the option. In this post, we will run a command `gcc -fdump-rtl-all test.c -o test` and will guess what's done.
Progress
Let's first start with `-fdump-tree-all` option.
The result will be:gcc -fdump-rtl-all test.c -o test
test test.c.271r.jump test.c.310r.mode_sw test.c.329r.jump2 test.c.359r.shorten test.c test.c.283r.reginfo test.c.311r.asmcons test.c.349r.zero_call_used_regs test.c.360r.nothrow test.c.268r.expand test.c.306r.outof_cfglayout test.c.318r.ira test.c.350r.alignments test.c.361r.dwarf2 test.c.269r.vregs test.c.307r.split1 test.c.319r.reload test.c.354r.barriers test.c.362r.final test.c.270r.into_cfglayout test.c.309r.dfinit test.c.326r.pro_and_epilogue test.c.356r.split5 test.c.363r.dfinish
Let's take a look at the first rtl file. `test.c.268r.expand`
;; Function main (main, funcdef_no=0, decl_uid=4853, cgraph_uid=1, symbol_order=0)
;; Generating RTL for gimple basic block 2
;; Generating RTL for gimple basic block 3
;; Generating RTL for gimple basic block 4
;; Generating RTL for gimple basic block 5
;; Generating RTL for gimple basic block 6
;; Generating RTL for gimple basic block 7
;; Generating RTL for gimple basic block 8
;; Generating RTL for gimple basic block 9
try_optimize_cfg iteration 1
Merging block 3 into block 2...
Merged blocks 2 and 3.
Merged 2 and 3 without moving.
Merging block 10 into block 9...
Merged blocks 9 and 10.
Merged 9 and 10 without moving.
Removing jump 54.
Merging block 11 into block 9...
Merged blocks 9 and 11.
Merged 9 and 11 without moving.
try_optimize_cfg iteration 2
;;
;; Full RTL generated for this function:
;;
(note 1 0 3 NOTE_INSN_DELETED)
(note 3 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
(note 2 3 5 2 NOTE_INSN_FUNCTION_BEG)
(insn 5 2 6 2 (set (mem/c:SI (plus:DI (reg/f:DI 96 virtual-stack-vars)
(const_int -4 [0xfffffffffffffffc])) [1 sum+0 S4 A32])
(const_int 0 [0])) "test.c":4:9 -1
(nil))
(insn 6 5 7 2 (set (reg:SI 105)
(const_int 1 [0x1])) "test.c":7:14 -1
(nil))
(insn 7 6 8 2 (set (mem/c:SI (plus:DI (reg/f:DI 96 virtual-stack-vars)
(const_int -8 [0xfffffffffffffff8])) [1 i+0 S4 A64])
(reg:SI 105)) "test.c":7:14 -1
(nil))
(jump_insn 8 7 9 2 (set (pc)
(label_ref 37)) "test.c":7:5 -1
(nil)
-> 37)
(barrier 9 8 39)
(code_label 39 9 10 4 5 (nil) [1 uses])
(note 10 39 11 4 [bb 4] NOTE_INSN_BASIC_BLOCK)
(insn 11 10 12 4 (set (reg:SI 107)
(mem/c:SI (plus:DI (reg/f:DI 96 virtual-stack-vars)
(const_int -4 [0xfffffffffffffffc])) [1 sum+0 S4 A32])) "test.c":8:13 -1
(nil))
(insn 12 11 13 4 (set (reg:SI 108)
(mem/c:SI (plus:DI (reg/f:DI 96 virtual-stack-vars)
(const_int -8 [0xfffffffffffffff8])) [1 i+0 S4 A64])) "test.c":8:13 -1
(nil))
(insn 13 12 14 4 (set (reg:SI 106 [ sum_12 ])
(plus:SI (reg:SI 107)
(reg:SI 108))) "test.c":8:13 -1
(nil))
(insn 14 13 15 4 (set (mem/c:SI (plus:DI (reg/f:DI 96 virtual-stack-vars)
(const_int -4 [0xfffffffffffffffc])) [1 sum+0 S4 A32])
(reg:SI 106 [ sum_12 ])) "test.c":8:13 -1
(nil))
(insn 15 14 16 4 (set (reg:SI 101 [ i.0_1 ])
(mem/c:SI (plus:DI (reg/f:DI 96 virtual-stack-vars)
(const_int -8 [0xfffffffffffffff8])) [1 i+0 S4 A64])) "test.c":11:19 -1
(nil))
(insn 16 15 17 4 (set (reg:SI 102 [ _2 ])
(and:SI (reg:SI 101 [ i.0_1 ])
(const_int 1 [0x1]))) "test.c":11:19 -1
(nil))
(insn 17 16 18 4 (set (reg:CC 66 cc)
(compare:CC (reg:SI 102 [ _2 ])
(const_int 0 [0]))) "test.c":11:12 -1
(nil))
(jump_insn 18 17 19 4 (set (pc)
(if_then_else (ne (reg:CC 66 cc)
(const_int 0 [0]))
(label_ref 26)
(pc))) "test.c":11:12 -1
(nil)
-> 26)
(note 19 18 20 5 [bb 5] NOTE_INSN_BASIC_BLOCK)
(insn 20 19 21 5 (set (reg:SI 1 x1)
(mem/c:SI (plus:DI (reg/f:DI 96 virtual-stack-vars)
(const_int -8 [0xfffffffffffffff8])) [1 i+0 S4 A64])) "test.c":12:13 -1
(nil))
(insn 21 20 22 5 (set (reg:DI 109)
(high:DI (symbol_ref/f:DI ("*.LC0") [flags 0x2] <var_decl 0xffffabf476c0 *.LC0>))) "test.c":12:13 -1
(nil))
(insn 22 21 23 5 (set (reg:DI 0 x0)
(lo_sum:DI (reg:DI 109)
(symbol_ref/f:DI ("*.LC0") [flags 0x2] <var_decl 0xffffabf476c0 *.LC0>))) "test.c":12:13 -1
(expr_list:REG_EQUAL (symbol_ref/f:DI ("*.LC0") [flags 0x2] <var_decl 0xffffabf476c0 *.LC0>)
(nil)))
Wow, this is a very long code and daunting, and I didn't even paste all of the code in the file. It's just about a half of the file. And I cant' even understand what is what. It's looks impossible to understand. According to GCC Docs, it "Dumps after RTL generation".Let's take a look at the next one. `test.c.269r.vregs`
;; Function main (main, funcdef_no=0, decl_uid=4853, cgraph_uid=1, symbol_order=0)
(note 1 0 3 NOTE_INSN_DELETED)
(note 3 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
(note 2 3 5 2 NOTE_INSN_FUNCTION_BEG)
(insn 5 2 6 2 (set (mem/c:SI (plus:DI (reg/f:DI 64 sfp)
(const_int -4 [0xfffffffffffffffc])) [1 sum+0 S4 A32])
(const_int 0 [0])) "test.c":4:9 69 {*movsi_aarch64}
(nil))
(insn 6 5 7 2 (set (reg:SI 105)
(const_int 1 [0x1])) "test.c":7:14 69 {*movsi_aarch64}
(nil))
(insn 7 6 8 2 (set (mem/c:SI (plus:DI (reg/f:DI 64 sfp)
(const_int -8 [0xfffffffffffffff8])) [1 i+0 S4 A64])
(reg:SI 105)) "test.c":7:14 69 {*movsi_aarch64}
(nil))
(jump_insn 8 7 9 2 (set (pc)
(label_ref 37)) "test.c":7:5 6 {jump}
(nil)
-> 37)
(barrier 9 8 39)
(code_label 39 9 10 4 5 (nil) [1 uses])
(note 10 39 11 4 [bb 4] NOTE_INSN_BASIC_BLOCK)
(insn 11 10 12 4 (set (reg:SI 107)
(mem/c:SI (plus:DI (reg/f:DI 64 sfp)
(const_int -4 [0xfffffffffffffffc])) [1 sum+0 S4 A32])) "test.c":8:13 69 {*movsi_aarch64}
(nil))
(insn 12 11 13 4 (set (reg:SI 108)
(mem/c:SI (plus:DI (reg/f:DI 64 sfp)
(const_int -8 [0xfffffffffffffff8])) [1 i+0 S4 A64])) "test.c":8:13 69 {*movsi_aarch64}
(nil))
(insn 13 12 14 4 (set (reg:SI 106 [ sum_12 ])
(plus:SI (reg:SI 107)
(reg:SI 108))) "test.c":8:13 119 {*addsi3_aarch64}
(nil))
(insn 14 13 15 4 (set (mem/c:SI (plus:DI (reg/f:DI 64 sfp)
(const_int -4 [0xfffffffffffffffc])) [1 sum+0 S4 A32])
(reg:SI 106 [ sum_12 ])) "test.c":8:13 69 {*movsi_aarch64}
(nil))
(insn 15 14 16 4 (set (reg:SI 101 [ i.0_1 ])
(mem/c:SI (plus:DI (reg/f:DI 64 sfp)
(const_int -8 [0xfffffffffffffff8])) [1 i+0 S4 A64])) "test.c":11:19 69 {*movsi_aarch64}
(nil))
(insn 16 15 17 4 (set (reg:SI 102 [ _2 ])
(and:SI (reg:SI 101 [ i.0_1 ])
(const_int 1 [0x1]))) "test.c":11:19 503 {andsi3}
(nil))
(insn 17 16 18 4 (set (reg:CC 66 cc)
(compare:CC (reg:SI 102 [ _2 ])
(const_int 0 [0]))) "test.c":11:12 404 {cmpsi}
(nil))
(jump_insn 18 17 19 4 (set (pc)
(if_then_else (ne (reg:CC 66 cc)
(const_int 0 [0]))
(label_ref 26)
(pc))) "test.c":11:12 19 {condjump}
(nil)
-> 26)
(note 19 18 20 5 [bb 5] NOTE_INSN_BASIC_BLOCK)
(insn 20 19 21 5 (set (reg:SI 1 x1)
(mem/c:SI (plus:DI (reg/f:DI 64 sfp)
(const_int -8 [0xfffffffffffffff8])) [1 i+0 S4 A64])) "test.c":12:13 69 {*movsi_aarch64}
(nil))
(insn 21 20 22 5 (set (reg:DI 109)
(high:DI (symbol_ref/f:DI ("*.LC0") [flags 0x2] <var_decl 0xffff9f7476c0 *.LC0>))) "test.c":12:13 70 {*movdi_aarch64}
(nil))
It dumps after converting virtual registers to hard registers. Ad we can see the difference between .expand file and .vreg files as Following:
$ diff test.c.268r.expand test.c.269r.vregs
It showes the following result:4,42d3
<
< ;; Generating RTL for gimple basic block 2
<
< ;; Generating RTL for gimple basic block 3
<
< ;; Generating RTL for gimple basic block 4
<
< ;; Generating RTL for gimple basic block 5
<
< ;; Generating RTL for gimple basic block 6
<
< ;; Generating RTL for gimple basic block 7
<
< ;; Generating RTL for gimple basic block 8
<
< ;; Generating RTL for gimple basic block 9
<
<
< try_optimize_cfg iteration 1
<
< Merging block 3 into block 2...
< Merged blocks 2 and 3.
< Merged 2 and 3 without moving.
< Merging block 10 into block 9...
< Merged blocks 9 and 10.
< Merged 9 and 10 without moving.
< Removing jump 54.
< Merging block 11 into block 9...
< Merged blocks 9 and 11.
< Merged 9 and 11 without moving.
<
<
< try_optimize_cfg iteration 2
<
<
<
< ;;
< ;; Full RTL generated for this function:
< ;;
46c7
< (insn 5 2 6 2 (set (mem/c:SI (plus:DI (reg/f:DI 96 virtual-stack-vars)
---
> (insn 5 2 6 2 (set (mem/c:SI (plus:DI (reg/f:DI 64 sfp)
48c9
< (const_int 0 [0])) "test.c":4:9 -1
---
> (const_int 0 [0])) "test.c":4:9 69 {*movsi_aarch64}
51c12
< (const_int 1 [0x1])) "test.c":7:14 -1
---
> (const_int 1 [0x1])) "test.c":7:14 69 {*movsi_aarch64}
53c14
< (insn 7 6 8 2 (set (mem/c:SI (plus:DI (reg/f:DI 96 virtual-stack-vars)
---
> (insn 7 6 8 2 (set (mem/c:SI (plus:DI (reg/f:DI 64 sfp)
55c16
< (reg:SI 105)) "test.c":7:14 -1
---
> (reg:SI 105)) "test.c":7:14 69 {*movsi_aarch64}
58c19
< (label_ref 37)) "test.c":7:5 -1
---
> (label_ref 37)) "test.c":7:5 6 {jump}
65,66c26,27
< (mem/c:SI (plus:DI (reg/f:DI 96 virtual-stack-vars)
< (const_int -4 [0xfffffffffffffffc])) [1 sum+0 S4 A32])) "test.c":8:13 -1
---
> (mem/c:SI (plus:DI (reg/f:DI 64 sfp)
> (const_int -4 [0xfffffffffffffffc])) [1 sum+0 S4 A32])) "test.c":8:13 69 {*movsi_aarch64}
69,70c30,31
< (mem/c:SI (plus:DI (reg/f:DI 96 virtual-stack-vars)
< (const_int -8 [0xfffffffffffffff8])) [1 i+0 S4 A64])) "test.c":8:13 -1
---
> (mem/c:SI (plus:DI (reg/f:DI 64 sfp)
> (const_int -8 [0xfffffffffffffff8])) [1 i+0 S4 A64])) "test.c":8:13 69 {*movsi_aarch64}
74c35
< (reg:SI 108))) "test.c":8:13 -1
---
> (reg:SI 108))) "test.c":8:13 119 {*addsi3_aarch64}
76c37
< (insn 14 13 15 4 (set (mem/c:SI (plus:DI (reg/f:DI 96 virtual-stack-vars)
---
> (insn 14 13 15 4 (set (mem/c:SI (plus:DI (reg/f:DI 64 sfp)
78c39
< (reg:SI 106 [ sum_12 ])) "test.c":8:13 -1
---
> (reg:SI 106 [ sum_12 ])) "test.c":8:13 69 {*movsi_aarch64}
81,82c42,43
< (mem/c:SI (plus:DI (reg/f:DI 96 virtual-stack-vars)
< (const_int -8 [0xfffffffffffffff8])) [1 i+0 S4 A64])) "test.c":11:19 -1
---
> (mem/c:SI (plus:DI (reg/f:DI 64 sfp)
> (const_int -8 [0xfffffffffffffff8])) [1 i+0 S4 A64])) "test.c":11:19 69 {*movsi_aarch64}
86c47
< (const_int 1 [0x1]))) "test.c":11:19 -1
---
> (const_int 1 [0x1]))) "test.c":11:19 503 {andsi3}
90c51
< (const_int 0 [0]))) "test.c":11:12 -1
---
> (const_int 0 [0]))) "test.c":11:12 404 {cmpsi}
96c57
< (pc))) "test.c":11:12 -1
---
> (pc))) "test.c":11:12 19 {condjump}
101,102c62,63
We can see what's done by `vregs` process. It looks like it got rid of `virtual-stack-vars`, and added `{*movsi_aarch64}` .
Let's check just one more. `test.c.270r.into_cfglayout`
;; Function main (main, funcdef_no=0, decl_uid=4853, cgraph_uid=1, symbol_order=0)
try_optimize_cfg iteration 1
Removing jump 8.
Removing jump 24.
try_optimize_cfg iteration 2
try_optimize_cfg iteration 1
(note 3 0 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
(note 2 3 5 2 NOTE_INSN_FUNCTION_BEG)
(insn 5 2 6 2 (set (mem/c:SI (plus:DI (reg/f:DI 64 sfp)
(const_int -4 [0xfffffffffffffffc])) [1 sum+0 S4 A32])
(const_int 0 [0])) "test.c":4:9 69 {*movsi_aarch64}
(nil))
(insn 6 5 7 2 (set (reg:SI 105)
(const_int 1 [0x1])) "test.c":7:14 69 {*movsi_aarch64}
(nil))
(insn 7 6 39 2 (set (mem/c:SI (plus:DI (reg/f:DI 64 sfp)
(const_int -8 [0xfffffffffffffff8])) [1 i+0 S4 A64])
(reg:SI 105)) "test.c":7:14 69 {*movsi_aarch64}
(nil))
; pc falls through to BB 7
(code_label 39 7 10 3 5 (nil) [1 uses])
(note 10 39 11 3 [bb 3] NOTE_INSN_BASIC_BLOCK)
(insn 11 10 12 3 (set (reg:SI 107)
(mem/c:SI (plus:DI (reg/f:DI 64 sfp)
(const_int -4 [0xfffffffffffffffc])) [1 sum+0 S4 A32])) "test.c":8:13 69 {*movsi_aarch64}
(nil))
(insn 12 11 13 3 (set (reg:SI 108)
(mem/c:SI (plus:DI (reg/f:DI 64 sfp)
(const_int -8 [0xfffffffffffffff8])) [1 i+0 S4 A64])) "test.c":8:13 69 {*movsi_aarch64}
(nil))
(insn 13 12 14 3 (set (reg:SI 106 [ sum_12 ])
(plus:SI (reg:SI 107)
(reg:SI 108))) "test.c":8:13 119 {*addsi3_aarch64}
(nil))
(insn 14 13 15 3 (set (mem/c:SI (plus:DI (reg/f:DI 64 sfp)
(const_int -4 [0xfffffffffffffffc])) [1 sum+0 S4 A32])
(reg:SI 106 [ sum_12 ])) "test.c":8:13 69 {*movsi_aarch64}
(nil))
(insn 15 14 16 3 (set (reg:SI 101 [ i.0_1 ])
(mem/c:SI (plus:DI (reg/f:DI 64 sfp)
(const_int -8 [0xfffffffffffffff8])) [1 i+0 S4 A64])) "test.c":11:19 69 {*movsi_aarch64}
(nil))
(insn 16 15 17 3 (set (reg:SI 102 [ _2 ])
(and:SI (reg:SI 101 [ i.0_1 ])
(const_int 1 [0x1]))) "test.c":11:19 503 {andsi3}
(nil))
(insn 17 16 18 3 (set (reg:CC 66 cc)
(compare:CC (reg:SI 102 [ _2 ])
(const_int 0 [0]))) "test.c":11:12 404 {cmpsi}
(nil))
(jump_insn 18 17 19 3 (set (pc)
(if_then_else (ne (reg:CC 66 cc)
(const_int 0 [0]))
(label_ref 26)
(pc))) "test.c":11:12 19 {condjump}
This process dumps after converting to cfglayout mode. As you can see, there is some information provided in the beginning.
try_optimize_cfg iteration 1 Removing jump 8. Removing jump 24. try_optimize_cfg iteration 2 try_optimize_cfg iteration 1
Conclusion
Since RTL files are very low level, I couldn't understand the meaning of the code and what's going on in the file. However, at least I could see some optimization is done on register level.
Comments
Post a Comment