rq_for_each_bio(bio, req) {
bio_for_each_segment(bv, bio, i) {
if (bv->bv_len & (blksize - 1))
/* Eckd can only do full blocks. */
return ERR_PTR(-EINVAL);
We needed to make sure that this was the actual point of failure but since we cannot modify the kernel on IBM zSeries machines, we needed to use console debugging to confirm the suspicion.
Firstly install kernel-debuginfo package which you will need for debugging.
* Get disassembly code of the module using objdump
objdump -d /lib/modules/2.6.9-67.EL/kernel/drivers/s390/block/dasd_eckd_mod.ko > dasd_eckd_dis
0000000000001ef0
1ef0: eb 6f f0 48 00 24 stmg %r6,%r15,72(%r15)
1ef6: c0 d0 00 00 00 00 larl %r13,1ef6
1efc: b9 04 00 ef lgr %r14,%r15
1f00: a7 fb fe 40 aghi %r15,-448
1f04: c0 c0 00 00 00 00 larl %r12,1f04
1f0a: e3 e0 f0 98 00 24 stg %r14,152(%r15)
1f10: e3 30 f0 e0 00 24 stg %r3,224(%r15)
.....
.....
2134: a7 0a ff ff ahi %r0,-1
2138: a7 aa 00 09 ahi %r10,9
213c: 18 51 lr %r5,%r1
213e: e3 e0 c0 00 00 04 lg %r14,0(%r12)
2144: e3 90 f1 18 00 04 lg %r9,280(%r15)
214a: 58 20 90 08 l %r2,8(%r9)
214e: 18 12 lr %r1,%r2
2150: 18 32 lr %r3,%r2
2152: 14 10 nr %r1,%r0
2154: 88 30 a0 00 srl %r3,0(%r10)
* 2158: a7 74 04 53 jne 29fe
215c: e3 10 90 00 00 04 lg %r1,0(%r9)
.....
.....
29fe: a7 29 ff ea lghi %r2,-22
2a02: a7 f4 06 1b j 3638
The line with the '*' was the suspected line which was doing the bv_len alignment check. The jump from this line to dasd_eckd_build_cp+0x29fe confirmed this. EINVAL = 22 and after loading the return value the function returns.
Now while setting breakpoints, remember not to use addresses or offsets from the objdump(if you do remember to subtract start addr of the function from the offset of the instruction). Or better yet you can use the disassembly output from GDB:
$ gdb /lib/modules/2.6.9-67.EL/kernel/drivers/s390/block/dasd_eckd_mod.ko
(gdb) disassemble dasd_eckd_build_cp
Dump of assembler code for function dasd_eckd_build_cp:
0x0000000000001ef0
0x0000000000001ef6
0x0000000000001efc
0x0000000000001f00
0x0000000000001f04
0x0000000000001f0a
0x0000000000001f10
0x0000000000001f16
0x0000000000001f1c
0x0000000000001f20
0x0000000000001f26
0x0000000000001f2a
0x0000000000001f30
0x0000000000001f34
0x0000000000001f38
0x0000000000001f3c
0x0000000000001f40
.....
.....
Remember that the offsets here are in decimal not in hex.
0x00000000000029fe
OK, so as you can see we are jumping to 0x29fe which is the above instruction for which the offset is 2830. So find out the address for dasd_eckd_build_cp from kallsyms
cat /proc/kallsyms > kallsyms.
Here it is 0x000000006088f27c, so we place a breakpoint on /x 0x000000006088f27c+ 2830
Go to the CP console and all the rest is to be done here:
#CP TR I PSWA 6088fd8a
- this will set a trace point at given address.
The reason we should not set a tracepoint on the above jump instructions is because we see many false positives and that makes debugging impossible.
You can query all tracepoints using #CP Q TR.
When you hit a tracepoint, you can do the following to continue #CP B
If you want to single step: #CP TR I
followed by #CP B
Each "#CP B" will take you forward one instruction.
When you want to end tracing and remove all traces do #CP TR END
If you want to break within a function address range:
#CP TR I R
eg. #CP TR I R 6088f27c.200
To display registers you can use:
#CP D G - group registers
#CP D X - control registers
#CP D AR - access registers
#CP D PSW - display PSW
After insmod'ing a reproducer module, with the trace on - we hit the instruction which returns EINVAL thereby confirming the problem.
1 comment:
The information here is great. I will invite my friends here.
Thanks
Post a Comment