You should make sure to read the
This guide is x86-centric for now. The reason being, the majority of broken object files are due to poorly written x86 assembly stemming from the simple fact that the x86 architecture has so few registers. Other architectures have a large enough register set that they can reserve a register as the "PIC register" without incurring a performance hit. Every architecture has to be mindful of PIC and its implications, x86 just happens to be the dominant architecture at the moment in the 'desktop' world of open source.
We will update for non-x86 as we aquire details and useful examples.
Before you can start fixing something, you got to make sure it's broken first,
right? For this reason, we've developed a suite of tools named
Keep in mind that although these utilities are named PaX Utilities, they certainly do not require PaX or anything else like that on your system. The name is a historical artifact and want of a better name, has stuck.
Let's see if your system has any broken files.
$ scanelf -lpqt TEXTREL /usr/lib/opengl/xorg-x11/lib/libGL.so.1.2 TEXTREL /usr/lib/libSDL-1.2.so.0.7.2 TEXTREL /usr/lib/libdv.so.4.0.2 TEXTREL /usr/lib/libsmpeg-0.4.so.0.1.3 TEXTREL /usr/lib/libOSMesa.so.4.0 TEXTREL /usr/lib/libxvidcore.so.4.1
Ideally, scanelf should not display anything, but on an x86 system, this is
rarely the case. Here we can see six libraries with TEXTRELs in them.
To quickly find out what package these files come from, Gentoo users can
$ qfile `scanelf -qylpF%F#t` media-libs/libdv (/usr/lib/libdv.so.4.0.2) media-libs/libsdl (/usr/lib/libSDL-1.2.so.0.7.2) media-libs/smpeg (/usr/lib/libsmpeg-0.4.so.0.1.3) media-libs/xvid (/usr/lib/libxvidcore.so.4.1) x11-base/xorg-x11 (/usr/lib/opengl/xorg-x11/lib/libGL.so.1.2) x11-base/xorg-x11 (/usr/lib/libOSMesa.so.4.0)
Now that we know the offenders, we have a choice. We can file a bug upstream
(who generally don't care unless you can provide a fix), file a bug in the
Sometimes you may come across a package which contains a mountain of TEXTRELs with seemingly no relation to assembler. This may simply be because the objects were not properly compiled with the appropriate PIC flag. The fix is quite simple: make sure every object file that is linked into the final shared object is compiled with the appropriate PIC flag (typically -fPIC).
For example, let's look at the silc-plugin package. It builds up a few modules, but only compiles some of the objects with -fPIC that are linked into the final libsilc_core.so module. The output of scanelf here is quite extensive!
$ scanelf -qT /usr/lib/irssi/modules/libsilc_core.so | wc -l 10734 $ scanelf -qT /usr/lib/irssi/modules/libsilc_core.so ... libsilc_core.so: silc_client_ftp_ask_name [0xD542C] in silc_client_receive_new_id [0xD5380] libsilc_core.so: silc_client_run_one [0xD55CA] in silc_client_receive_new_id [0xD5380] libsilc_core.so: silc_id_payload_parse [0xD5842] in silc_client_packet_parse_type [0xD57B0] libsilc_core.so: fgetc@@GLIBC_2.0 [0xD5857] in silc_client_packet_parse_type [0xD57B0] ...
A TEXTREL on glibc's fgetc() function!? Either people are calling fgetc() from assembler (and should be shot), or something else is going on. A good rule of thumb is that if it seems like just about every function/variable reference causes a TEXTREL and it is all done in C code, then the file was not built as PIC. Just review the build output and see if the command to compile it was invoked with -fPIC. If not, go fix the build system as you do not need to dig into the source. Dodged the bullet this time!
So we have identified some broken libraries, and we want to fix them. The
trouble is, shared library code can be huge. They can have thousands of
functions which come from thousands of object files and thousands of source
code files which total megabytes in size (source code and compiled objects).
Where the hell do we start!? Once again, Mighty Mouse^W^W
[The output has been truncated from about 35 lines] $ scanelf -qT /usr/lib/libsmpeg-0.4.so.0.1.3 libsmpeg-0.4.so.0.1.3: (memory/fake?) [0x2FB3C] in cpu_flags [0x2FB10] libsmpeg-0.4.so.0.1.3: (memory/fake?) [0x2FB42] in cpu_flags [0x2FB10] libsmpeg-0.4.so.0.1.3: (memory/fake?) [0x2FB55] in IDCT_mmx [0x2FB48] libsmpeg-0.4.so.0.1.3: (memory/fake?) [0x2FB84] in IDCT_mmx [0x2FB48] /usr/lib/libsmpeg-0.4.so.0.1.3
The output here tells us that the
$ objdump -d /usr/lib/libsmpeg-0.4.so.0.1.3 ... 2fb0f: 90 nop 0002fb10 <cpu_flags>: 2fb10: 9c pushf 2fb11: 58 pop %eax ... 2fb32: 60 pusha 2fb33: b8 01 00 00 00 mov $0x1,%eax 2fb38: 0f a2 cpuid 2fb3a: 89 15 d0 d3 03 00 mov %edx,0x3d3d0 2fb40: 61 popa 2fb41: a1 d0 d3 03 00 mov 0x3d3d0,%eax 2fb46: c3 ret 2fb47: 90 nop ...
As you can see here, the two lines picked out in the body of
[The output has been truncated from about 180 lines] $ scanelf -qT /usr/lib/libdv.so.4.0.2 libdv.so.4.0.2: (memory/fake?) [0x14AA9] in dv_parse_ac_coeffs_pass0 [0x14A84] libdv.so.4.0.2: (memory/fake?) [0x14C28] in dv_parse_ac_coeffs_pass0 [0x14A84] libdv.so.4.0.2: (memory/fake?) [0x14C8A] in dv_parse_video_segment [0x14C6F] libdv.so.4.0.2: (memory/fake?) [0x14CA6] in dv_parse_video_segment [0x14C6F] libdv.so.4.0.2: (memory/fake?) [0x15248] in _dv_idct_block_mmx [0x15210] libdv.so.4.0.2: (memory/fake?) [0x152BE] in _dv_idct_block_mmx [0x15210] libdv.so.4.0.2: (memory/fake?) [0x1583D] in _dv_dct_88_block_mmx [0x157F8] libdv.so.4.0.2: (memory/fake?) [0x15847] in _dv_dct_88_block_mmx [0x157F8] libdv.so.4.0.2: (memory/fake?) [0x15F91] in _dv_dct_248_block_mmx [0x15F58] libdv.so.4.0.2: (memory/fake?) [0x15FE6] in _dv_dct_248_block_mmx [0x15F58] libdv.so.4.0.2: (memory/fake?) [0x163D3] in _dv_rgbtoycb_mmx [0x163C8] libdv.so.4.0.2: (memory/fake?) [0x163DD] in _dv_rgbtoycb_mmx [0x163C8] libdv.so.4.0.2: dv_vlc_class_index_mask [0x149A7] in dv_decode_vlc [0x14998] libdv.so.4.0.2: dv_vlc_class_index_rshift [0x149B0] in dv_decode_vlc [0x14998] libdv.so.4.0.2: dv_vlc_classes [0x149B9] in dv_decode_vlc [0x14998] libdv.so.4.0.2: dv_vlc_index_mask [0x149C4] in dv_decode_vlc [0x14998] libdv.so.4.0.2: sign_mask [0x149EB] in dv_decode_vlc [0x14998] libdv.so.4.0.2: sign_mask [0x14A5D] in __dv_decode_vlc [0x14A1C] libdv.so.4.0.2: sign_mask [0x14B82] in dv_parse_ac_coeffs_pass0 [0x14A84] libdv.so.4.0.2: dv_vlc_class_lookup5 [0x14A2F] in __dv_decode_vlc [0x14A1C] libdv.so.4.0.2: dv_parse_ac_coeffs_pass0 [0x14E03] in dv_parse_video_segment [0x14C6F] libdv.so.4.0.2: dv_parse_ac_coeffs [0x14E51] in dv_parse_video_segment [0x14C6F] libdv.so.4.0.2: dv_quant_offset [0x14E69] in _dv_quant_88_inverse_x86 [0x14E5C] libdv.so.4.0.2: dv_quant_offset [0x14FB3] in _dv_quant_x86 [0x14FA4] /usr/lib/libdv.so.4.0.2
Again, we can see that many functions (like
[The output has been truncated from about 50 lines] $ scanelf -qT /usr/lib/libSDL-1.2.so.0.7.2 libSDL-1.2.so.0.7.2: (memory/fake?) [0x4E213] in _ConvertMMXpII32_24RGB888 [0x4E210] libSDL-1.2.so.0.7.2: (memory/fake?) [0x4E29E] in _ConvertMMXpII32_16RGB565 [0x4E29B] libSDL-1.2.so.0.7.2: (memory/fake?) [0x4E3F6] in _ConvertMMXpII32_16BGR555 [0x4E3F3] libSDL-1.2.so.0.7.2: (memory/fake?) [0x4E402] in _ConvertMMXpII32_16RGB555 [0x4E3FF] libSDL-1.2.so.0.7.2: (memory/fake?) [0x4E555] in _Hermes_X86_CPU [0x4E529] libSDL-1.2.so.0.7.2: _copy_row [0x316A2] in SDL_SoftStretch [0x313C0] libSDL-1.2.so.0.7.2: _mmxreturn [0x4E4FB] in _ConvertMMXpII32_16RGB555 [0x4E3FF] libSDL-1.2.so.0.7.2: _x86return [0x4E590] in _ConvertX86p16_16BGR565 [0x4E560] libSDL-1.2.so.0.7.2: _x86return [0x4EE99] in _ConvertX86p32_16BGR555 [0x4EDCA] libSDL-1.2.so.0.7.2: _x86return [0x4EF4D] in _ConvertX86p32_8RGB332 [0x4EE9D] /usr/lib/libSDL-1.2.so.0.7.2
Doesn't seem to be anything new here. Poor memory usage in functions like
We've identified the functions and sometimes the variables which are causing us such headaches. Before we can actually fix them though, we have to narrow down the source code to the offending lines. Since we know the function names and either the symbol name or a relative position in the function, we should be able to focus our efforts quite easily.
Let's start with libsmpeg. We know that both the
$ tar zxf smpeg-0.4.4.tar.gz $ cd smpeg-0.4.4.tar.gz[Find the cpu_flags function] $ grep -Rl cpu_flags * video/mmxflags_asm.S video/parseblock.cpp $ grep cpu_flags video/mmxflags_asm.S .globl cpu_flags .type cpu_flags,@function<-- here is what we want cpu_flags: jz cpu_flags.L1 # Processor is 386 je cpu_flags.L1 cpu_flags.L1: .size cpu_flags,.Lfe1-cpu_flags[Find the IDCT_mmx function] $ grep -Rl IDCT_mmx * video/parseblock.cpp video/mmxidct_asm.S $ grep IDCT_mmx video/mmxidct_asm.S .globl IDCT_mmx .type IDCT_mmx,@function<-- here is what we want IDCT_mmx: .size IDCT_mmx,.Lfe1-IDCT_mmx
As we suspected, both the
$ grep -A 8 cpuid video/mmxflags_asm.S cpuid movl %edx,flags popa movl flags,%eax cpu_flags.L1:
In GNU assembler, registers are prefixed with a
$ grep -C 2 flags video/mmxflags_asm.S .data .align 16 .type flags,@object flags: .long 0 .text
Seems
If we analyze the
[Local variables] .data .align 8 .type x4546454645464546,@object .size x4546454645464546,8 x4546454645464546: .long 0x45464546,0x45464546 .align 8 .type x61f861f861f861f8,@object .size x61f861f861f861f8,8 x61f861f861f861f8: .long 0x61f861f8,0x61f861f8 .align 8 .type scratch1,@object .size scratch1,8 scratch1: .long 0,0[Absolute memory references] .text ... psraw $1, %mm5 /* t90=t93 */ pmulhw x4546454645464546,%mm0 /* V35 */ psraw $1, %mm2 /* t97 */ ... psubsw %mm2, %mm1 /* V32 ; free mm2 */ pmulhw x61f861f861f861f8,%mm1 /* V36 */ psllw $1, %mm7 /* t107 */ ... movq 8*3(%esi), %mm7 psraw $4, %mm4 movq %mm2, scratch1 /* out1 */
Again, before we jump into how to fix these, lets analyze a few more source files to get a better handle on identifying problematic code.
[objdump of _ConvertMMXpII32_24RGB888 memory reference] 0004e210 <_ConvertMMXpII32_24RGB888>: 4e210: 0f 6f 35 50 55 05 00 movq 0x55550,%mm6 4e217: 0f ef ff pxor %mm7,%mm7[_ConvertMMXpII32_24RGB888 is defined in src/hermes/mmxp2_32.asm] SECTION .data ALIGN 8 ;; Constants for conversion routines mmx32_rgb888_mask dd 00ffffffh,00ffffffh ... SECTION .text _ConvertMMXpII32_24RGB888:start of function 0x4E210 ; set up mm6 as the mask, mm7 as zero movq mm6, qword [mmx32_rgb888_mask]memory reference 0x4E213 pxor mm7, mm7
Simple enough, the
[SDL_SoftStretch is defined in src/video/SDL_stretch.c] int SDL_SoftStretch(SDL_Surface *src, SDL_Rect *srcrect, SDL_Surface *dst, SDL_Rect *dstrect) { ... #ifdef __GNUC__ __asm__ __volatile__ ( "call _copy_row" : "=&D" (u1), "=&S" (u2) : "0" (dstp), "1" (srcp) : "memory" ); #else
Another straight forward bug. An absolute reference to the
Now we know what broken code looks like. We can point out issues in code and confidently declare "that crap is broken". While this is a good thing, it certainly doesn't help much if no one knows how it's supposed to be written. Let's start with some rules of thumb.
General rules
x86-specific rules
arch | register |
---|---|
If you come across code which uses the PIC register in some inline assembly,
one fix may be to simply use a different register. For example, the x86
architecture has 6 general purpose registers (
A cleaner fix might be to just let gcc allocate the registers accordingly. If
the inline assembly doesn't actually care which registers it uses, change the
references from
Or, if the assembly uses an instruction which always clobbers
If all else fails, you can fall back to the slow push/pop
/* change this code from */ asm(" mov %0, %%eax mov %1, %%ebx add %%eax, %%ebx " : : "m"(input1), "m"(input2) : "eax" "ebx");/* to this functionality equivalent version */ asm(" mov %0, %%eax mov %1, %%ecx add %%eax, %%ecx " : : "m"(input1), "m"(input2) : "eax" "ecx");
/* change this code from */ asm(" mov %2, %%eax mov %3, %%ebx add %%eax, %%ebx " : "=a"(output1) "=b"(output2) : "m"(input1), "m"(input2));/* to this functionality equivalent version */ asm(" mov %2, %0 mov %3, %1 add %0, %1 " : "=r"(output1) "=r"(output2) : "m"(input1), "m"(input2));
asm("cpuid" : : : "eax", "ebx", "ecx", "edx");/* can be written to hide ebx */ asm(" movl %%ebx, %%edi cpuid movl %%edi, %%ebx " : : : "eax", "ecx", "edx", "edi");/* or a slower version using the stack */ asm(" pushl %%ebx cpuid popl %%ebx " : : : "eax", "ecx", "edx");
A lot of x86 MMX/SSE code loads bitmasks from local variables since they need to fill up a register which is larger (MMX/64bits or SSE/128bits) than the native bitsize (x86/32bits). They do this by defining the mask in consecutive bytes in memory and then having the cpu load the data from the memory region.
One way to get around this is by being creative with the stack. Rather than
use an absolute memory reference for the mask, push a bunch of 32bit values
onto the stack and use the address specified by the
/* Load masks from memory (causes TEXTRELs) */ .data m0X000000: .byte 0, 0, 0, 0, 0, 0, 255, 0 .text movq m0X000000, %mm5/* Load mask from stack (no TEXTRELs)*/ pushl $0x00FF0000 pushl $0x00000000 movq (%esp), %mm5 addl 8, %esp
A lot of inline assembly is written with the symbol names placed right in the code. Rather than trying to write custom code to handle PIC in assembly, just let gcc worry about it. Pass in the symbol via the input operand list as a memory constraint ("m") and gcc will handle all the rest.
unsigned long long a_mmx_mask = 0xf8007c00ffea0059ULL; void somefunction() {/* Common (but incorrect) method for loading masks */ asm("pmullw a_mmx_mask, %%mm0" : : );/* The correct way is to let gcc do it */ asm("pmullw %0, %%mm0" : : "m"(a_mmx_mask)); }
If your get a warning/error about one of the memory inputs needing to be an lvalue, then this usually means you're trying to pass in a pointer to an array/structure rather than the memory location itself. Fixing this may be as simple as dereferencing the variable in the constraint list rather than in the assembly itself.
Hand written assembly sometimes need to access variables (whether they be
local or global). Since none of the previous tricks will work, you just need
to grind your teeth and dig in to write real PIC references yourself using
the GOT. Make sure you keep in mind the first rule of thumb: Do not mix PIC
and non-PIC object code. This probably will require the hand written
assembly be preprocessed before it is assembled, so an assembly source file
with a
Also keep in mind that using @GOTOFF will return the variable while using @GOT will return a pointer to the variable. So accessing a variable with @GOT will require two steps.
#ifdef __PIC__ # undef __i686 /* gcc builtin define gets in our way */ # define MUNG_LOCAL(sym) sym ## @GOTOFF(%ebx) # define MUNG_EXTERN(sym) sym ## @GOT(%ebx) # define DEREF_EXTERN(reg) movl (reg), reg # define INIT_PIC() \ call __i686.get_pc_thunk.bx ; \ addl $_GLOBAL_OFFSET_TABLE_, %ebx #else # define MUNG_LOCAL(sym) sym # define MUNG_EXTERN(sym) sym # define DEREF_EXTERN(reg) # define INIT_PIC() #endif ... some_function: .../* needs to be before first memory reference */ INIT_PIC() ... movl MUNG_EXTERN(some_external_variable), %eax DEREF_EXTERN(%eax) ... movl %eax, MUNG_LOCAL(some_local_variable) ... #ifdef __PIC__ .section .gnu.linkonce.t.__i686.get_pc_thunk.bx,"ax",@progbits .globl __i686.get_pc_thunk.bx .hidden __i686.get_pc_thunk.bx .type __i686.get_pc_thunk.bx,@function __i686.get_pc_thunk.bx: movl (%esp), %ebx ret #endif
Since we hide the PIC details behind the preprocessor define
The
So if the previous code snippets were broken, what should they look like you may wonder. Well let's find out.
[Non-PIC Version] .type flags,@object flags: .long 0 ... pusha movl $1,%eax cpuid movl %edx,flags popa movl flags,%eax[PIC Version] pushl %ebx movl $1,%eax cpuid movl %edx,%eax popl %ebx
[Non-PIC Version] pmulhw x5a825a825a825a82, %mm1[PIC Version] #ifdef __PIC__ # undef __i686 /* gcc define gets in our way */ call __i686.get_pc_thunk.bx addl $_GLOBAL_OFFSET_TABLE_, %ebx #endif ... pmulhw x5a825a825a825a82@GOTOFF(%ebx), %mm1 ... #ifdef __PIC__ .section .gnu.linkonce.t.__i686.get_pc_thunk.bx,"ax",@progbits .globl __i686.get_pc_thunk.bx .hidden __i686.get_pc_thunk.bx .type __i686.get_pc_thunk.bx,@function __i686.get_pc_thunk.bx: movl (%esp), %ebx ret #endif
[Non-PIC Version] mmx32_rgb888_mask dd 00ffffffh,00ffffffh ... movq mm6, qword [mmx32_rgb888_mask][PIC Version] %macro _push_immq_mask 1 push dword %1 push dword %1 %endmacro %macro load_immq 2 _push_immq_mask %2 movq %1, [esp] %endmacro %define mmx32_rgb888_mask 00ffffffh ... load_immq mm6, mmx32_rgb888_mask CLEANUP_IMMQ_LOADS(1)
[Non-PIC Version] __asm__ __volatile__ ( "call _copy_row" : "=&D" (u1), "=&S" (u2) : "0" (dstp), "1" (srcp) : "memory" );[PIC Version] __asm__ __volatile__ ( "call *%4" : "=&D" (u1), "=&S" (u2) : "0" (dstp), "1" (srcp), "r" (&_copy_row) : "memory" );