annotate libpostproc/postprocess_template.c @ 2841:bceeca1bb30f libavcodec

vbr audio encode patch by (Justin Ruggles: jruggle, earthlink net) with changes by me int->float as video uses float too remove silent cliping to some per codec range, this should result in an error instead remove change to utils.c as its inconsistant with video
author michael
date Sun, 21 Aug 2005 20:27:00 +0000
parents 49da251f2608
children ef2149182f1c
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1 /*
223
f0e15c953995 minor QP bugfix
michael
parents: 211
diff changeset
2 Copyright (C) 2001-2002 Michael Niedermayer (michaelni@gmx.at)
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
4 This program is free software; you can redistribute it and/or modify
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
5 it under the terms of the GNU General Public License as published by
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
6 the Free Software Foundation; either version 2 of the License, or
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
7 (at your option) any later version.
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
8
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
9 This program is distributed in the hope that it will be useful,
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
10 but WITHOUT ANY WARRANTY; without even the implied warranty of
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
11 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
12 GNU General Public License for more details.
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
13
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
14 You should have received a copy of the GNU General Public License
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
15 along with this program; if not, write to the Free Software
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
16 Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
17 */
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
18
1109
3644e555a20a doxy / cosmetics
michaelni
parents: 1029
diff changeset
19 /**
3644e555a20a doxy / cosmetics
michaelni
parents: 1029
diff changeset
20 * @file postprocess_template.c
3644e555a20a doxy / cosmetics
michaelni
parents: 1029
diff changeset
21 * mmx/mmx2/3dnow postprocess code.
3644e555a20a doxy / cosmetics
michaelni
parents: 1029
diff changeset
22 */
3644e555a20a doxy / cosmetics
michaelni
parents: 1029
diff changeset
23
3644e555a20a doxy / cosmetics
michaelni
parents: 1029
diff changeset
24
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
25 #ifdef ARCH_X86_64
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
26 # define REGa rax
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
27 # define REGc rcx
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
28 # define REGd rdx
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
29 # define REG_a "rax"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
30 # define REG_c "rcx"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
31 # define REG_d "rdx"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
32 # define REG_SP "rsp"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
33 # define ALIGN_MASK "$0xFFFFFFFFFFFFFFF8"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
34 #else
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
35 # define REGa eax
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
36 # define REGc ecx
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
37 # define REGd edx
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
38 # define REG_a "eax"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
39 # define REG_c "ecx"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
40 # define REG_d "edx"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
41 # define REG_SP "esp"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
42 # define ALIGN_MASK "$0xFFFFFFF8"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
43 #endif
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
44
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
45
169
20bcd5b70886 runtime cpu detection
michael
parents: 168
diff changeset
46 #undef PAVGB
20bcd5b70886 runtime cpu detection
michael
parents: 168
diff changeset
47 #undef PMINUB
20bcd5b70886 runtime cpu detection
michael
parents: 168
diff changeset
48 #undef PMAXUB
104
9607b48e2c2d Cleanup:
arpi
parents: 102
diff changeset
49
9607b48e2c2d Cleanup:
arpi
parents: 102
diff changeset
50 #ifdef HAVE_MMX2
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
51 #define REAL_PAVGB(a,b) "pavgb " #a ", " #b " \n\t"
104
9607b48e2c2d Cleanup:
arpi
parents: 102
diff changeset
52 #elif defined (HAVE_3DNOW)
2295
rfelker
parents: 2293
diff changeset
53 #define REAL_PAVGB(a,b) "pavgusb " #a ", " #b " \n\t"
104
9607b48e2c2d Cleanup:
arpi
parents: 102
diff changeset
54 #endif
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
55 #define PAVGB(a,b) REAL_PAVGB(a,b)
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
56
134
2c469e390117 dering in c
michael
parents: 133
diff changeset
57 #ifdef HAVE_MMX2
2c469e390117 dering in c
michael
parents: 133
diff changeset
58 #define PMINUB(a,b,t) "pminub " #a ", " #b " \n\t"
2c469e390117 dering in c
michael
parents: 133
diff changeset
59 #elif defined (HAVE_MMX)
2c469e390117 dering in c
michael
parents: 133
diff changeset
60 #define PMINUB(b,a,t) \
2c469e390117 dering in c
michael
parents: 133
diff changeset
61 "movq " #a ", " #t " \n\t"\
2c469e390117 dering in c
michael
parents: 133
diff changeset
62 "psubusb " #b ", " #t " \n\t"\
2c469e390117 dering in c
michael
parents: 133
diff changeset
63 "psubb " #t ", " #a " \n\t"
2c469e390117 dering in c
michael
parents: 133
diff changeset
64 #endif
2c469e390117 dering in c
michael
parents: 133
diff changeset
65
2c469e390117 dering in c
michael
parents: 133
diff changeset
66 #ifdef HAVE_MMX2
2c469e390117 dering in c
michael
parents: 133
diff changeset
67 #define PMAXUB(a,b) "pmaxub " #a ", " #b " \n\t"
2c469e390117 dering in c
michael
parents: 133
diff changeset
68 #elif defined (HAVE_MMX)
2c469e390117 dering in c
michael
parents: 133
diff changeset
69 #define PMAXUB(a,b) \
2c469e390117 dering in c
michael
parents: 133
diff changeset
70 "psubusb " #a ", " #b " \n\t"\
2c469e390117 dering in c
michael
parents: 133
diff changeset
71 "paddb " #a ", " #b " \n\t"
2c469e390117 dering in c
michael
parents: 133
diff changeset
72 #endif
2c469e390117 dering in c
michael
parents: 133
diff changeset
73
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
74 //FIXME? |255-0| = 1 (shouldnt be a problem ...)
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
75 #ifdef HAVE_MMX
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
76 /**
111
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
77 * Check if the middle 8x8 Block in the given 8x16 block is flat
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
78 */
1327
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
79 static inline int RENAME(vertClassify)(uint8_t src[], int stride, PPContext *c){
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
80 int numEq= 0, dcOk;
111
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
81 src+= stride*4; // src points to begin of the 8x8 Block
119
b2f0e40866b1 optimizations (+2% speedup)
michael
parents: 118
diff changeset
82 asm volatile(
1331
cf65e69400ec gcc 2.95 workaround
michaelni
parents: 1327
diff changeset
83 "movq %0, %%mm7 \n\t"
cf65e69400ec gcc 2.95 workaround
michaelni
parents: 1327
diff changeset
84 "movq %1, %%mm6 \n\t"
cf65e69400ec gcc 2.95 workaround
michaelni
parents: 1327
diff changeset
85 : : "m" (c->mmxDcOffset[c->nonBQP]), "m" (c->mmxDcThreshold[c->nonBQP])
cf65e69400ec gcc 2.95 workaround
michaelni
parents: 1327
diff changeset
86 );
cf65e69400ec gcc 2.95 workaround
michaelni
parents: 1327
diff changeset
87
cf65e69400ec gcc 2.95 workaround
michaelni
parents: 1327
diff changeset
88 asm volatile(
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
89 "lea (%2, %3), %%"REG_a" \n\t"
119
b2f0e40866b1 optimizations (+2% speedup)
michael
parents: 118
diff changeset
90 // 0 1 2 3 4 5 6 7 8 9
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
91 // %1 eax eax+%2 eax+2%2 %1+4%2 ecx ecx+%2 ecx+2%2 %1+8%2 ecx+4%2
791
4f61ca80b6c1 better deblocking filter
michael
parents: 789
diff changeset
92
1327
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
93 "movq (%2), %%mm0 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
94 "movq (%%"REG_a"), %%mm1 \n\t"
1327
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
95 "movq %%mm0, %%mm3 \n\t"
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
96 "movq %%mm0, %%mm4 \n\t"
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
97 PMAXUB(%%mm1, %%mm4)
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
98 PMINUB(%%mm1, %%mm3, %%mm5)
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
99 "psubb %%mm1, %%mm0 \n\t" // mm0 = differnece
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
100 "paddb %%mm7, %%mm0 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
101 "pcmpgtb %%mm6, %%mm0 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
102
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
103 "movq (%%"REG_a",%3), %%mm2 \n\t"
1327
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
104 PMAXUB(%%mm2, %%mm4)
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
105 PMINUB(%%mm2, %%mm3, %%mm5)
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
106 "psubb %%mm2, %%mm1 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
107 "paddb %%mm7, %%mm1 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
108 "pcmpgtb %%mm6, %%mm1 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
109 "paddb %%mm1, %%mm0 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
110
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
111 "movq (%%"REG_a", %3, 2), %%mm1 \n\t"
1327
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
112 PMAXUB(%%mm1, %%mm4)
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
113 PMINUB(%%mm1, %%mm3, %%mm5)
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
114 "psubb %%mm1, %%mm2 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
115 "paddb %%mm7, %%mm2 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
116 "pcmpgtb %%mm6, %%mm2 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
117 "paddb %%mm2, %%mm0 \n\t"
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
118
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
119 "lea (%%"REG_a", %3, 4), %%"REG_a" \n\t"
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
120
1327
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
121 "movq (%2, %3, 4), %%mm2 \n\t"
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
122 PMAXUB(%%mm2, %%mm4)
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
123 PMINUB(%%mm2, %%mm3, %%mm5)
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
124 "psubb %%mm2, %%mm1 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
125 "paddb %%mm7, %%mm1 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
126 "pcmpgtb %%mm6, %%mm1 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
127 "paddb %%mm1, %%mm0 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
128
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
129 "movq (%%"REG_a"), %%mm1 \n\t"
1327
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
130 PMAXUB(%%mm1, %%mm4)
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
131 PMINUB(%%mm1, %%mm3, %%mm5)
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
132 "psubb %%mm1, %%mm2 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
133 "paddb %%mm7, %%mm2 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
134 "pcmpgtb %%mm6, %%mm2 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
135 "paddb %%mm2, %%mm0 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
136
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
137 "movq (%%"REG_a", %3), %%mm2 \n\t"
1327
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
138 PMAXUB(%%mm2, %%mm4)
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
139 PMINUB(%%mm2, %%mm3, %%mm5)
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
140 "psubb %%mm2, %%mm1 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
141 "paddb %%mm7, %%mm1 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
142 "pcmpgtb %%mm6, %%mm1 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
143 "paddb %%mm1, %%mm0 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
144
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
145 "movq (%%"REG_a", %3, 2), %%mm1 \n\t"
1327
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
146 PMAXUB(%%mm1, %%mm4)
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
147 PMINUB(%%mm1, %%mm3, %%mm5)
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
148 "psubb %%mm1, %%mm2 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
149 "paddb %%mm7, %%mm2 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
150 "pcmpgtb %%mm6, %%mm2 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
151 "paddb %%mm2, %%mm0 \n\t"
1327
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
152 "psubusb %%mm3, %%mm4 \n\t"
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
153
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
154 " \n\t"
167
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
155 #ifdef HAVE_MMX2
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
156 "pxor %%mm7, %%mm7 \n\t"
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
157 "psadbw %%mm7, %%mm0 \n\t"
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
158 #else
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
159 "movq %%mm0, %%mm1 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
160 "psrlw $8, %%mm0 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
161 "paddb %%mm1, %%mm0 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
162 "movq %%mm0, %%mm1 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
163 "psrlq $16, %%mm0 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
164 "paddb %%mm1, %%mm0 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
165 "movq %%mm0, %%mm1 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
166 "psrlq $32, %%mm0 \n\t"
167
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
167 "paddb %%mm1, %%mm0 \n\t"
129
be35346e27c1 fixed difference with -vo md5 between doVertDefFilter() C and MMX / MMX2 versions
michael
parents: 128
diff changeset
168 #endif
1331
cf65e69400ec gcc 2.95 workaround
michaelni
parents: 1327
diff changeset
169 "movq %4, %%mm7 \n\t" // QP,..., QP
1327
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
170 "paddusb %%mm7, %%mm7 \n\t" // 2QP ... 2QP
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
171 "psubusb %%mm7, %%mm4 \n\t" // Diff <= 2QP -> 0
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
172 "packssdw %%mm4, %%mm4 \n\t"
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
173 "movd %%mm0, %0 \n\t"
1327
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
174 "movd %%mm4, %1 \n\t"
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
175
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
176 : "=r" (numEq), "=r" (dcOk)
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
177 : "r" (src), "r" ((long)stride), "m" (c->pQPb)
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
178 : "%"REG_a
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
179 );
1327
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
180
167
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
181 numEq= (-numEq) &0xFF;
1327
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
182 if(numEq > c->ppMode.flatnessThreshold){
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
183 if(dcOk) return 0;
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
184 else return 1;
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
185 }else{
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
186 return 2;
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
187 }
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
188 }
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
189 #endif
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
190
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
191 /**
111
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
192 * Do a vertical low pass filter on the 8x16 block (only write to the 8x8 block in the middle)
107
bd163e13a0fb minor cleanups
michael
parents: 106
diff changeset
193 * using the 9-Tap Filter (1,1,2,2,4,2,2,1,1)/16
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
194 */
2036
6a6c678517b3 altivec optimizations and horizontal filter fix by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 2031
diff changeset
195 #ifndef HAVE_ALTIVEC
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
196 static inline void RENAME(doVertLowPass)(uint8_t *src, int stride, PPContext *c)
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
197 {
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
198 #if defined (HAVE_MMX2) || defined (HAVE_3DNOW)
111
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
199 src+= stride*3;
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
200 asm volatile( //"movv %0 %1 %2\n\t"
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
201 "movq %2, %%mm0 \n\t" // QP,..., QP
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
202 "pxor %%mm4, %%mm4 \n\t"
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
203
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
204 "movq (%0), %%mm6 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
205 "movq (%0, %1), %%mm5 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
206 "movq %%mm5, %%mm1 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
207 "movq %%mm6, %%mm2 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
208 "psubusb %%mm6, %%mm5 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
209 "psubusb %%mm1, %%mm2 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
210 "por %%mm5, %%mm2 \n\t" // ABS Diff of lines
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
211 "psubusb %%mm0, %%mm2 \n\t" // diff <= QP -> 0
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
212 "pcmpeqb %%mm4, %%mm2 \n\t" // diff <= QP -> FF
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
213
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
214 "pand %%mm2, %%mm6 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
215 "pandn %%mm1, %%mm2 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
216 "por %%mm2, %%mm6 \n\t"// First Line to Filter
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
217
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
218 "movq (%0, %1, 8), %%mm5 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
219 "lea (%0, %1, 4), %%"REG_a" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
220 "lea (%0, %1, 8), %%"REG_c" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
221 "sub %1, %%"REG_c" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
222 "add %1, %0 \n\t" // %0 points to line 1 not 0
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
223 "movq (%0, %1, 8), %%mm7 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
224 "movq %%mm5, %%mm1 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
225 "movq %%mm7, %%mm2 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
226 "psubusb %%mm7, %%mm5 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
227 "psubusb %%mm1, %%mm2 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
228 "por %%mm5, %%mm2 \n\t" // ABS Diff of lines
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
229 "psubusb %%mm0, %%mm2 \n\t" // diff <= QP -> 0
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
230 "pcmpeqb %%mm4, %%mm2 \n\t" // diff <= QP -> FF
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
231
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
232 "pand %%mm2, %%mm7 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
233 "pandn %%mm1, %%mm2 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
234 "por %%mm2, %%mm7 \n\t" // First Line to Filter
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
235
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
236
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
237 // 1 2 3 4 5 6 7 8
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
238 // %0 %0+%1 %0+2%1 eax %0+4%1 eax+2%1 ecx eax+4%1
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
239 // 6 4 2 2 1 1
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
240 // 6 4 4 2
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
241 // 6 8 2
111
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
242
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
243 "movq (%0, %1), %%mm0 \n\t" // 1
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
244 "movq %%mm0, %%mm1 \n\t" // 1
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
245 PAVGB(%%mm6, %%mm0) //1 1 /2
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
246 PAVGB(%%mm6, %%mm0) //3 1 /4
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
247
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
248 "movq (%0, %1, 4), %%mm2 \n\t" // 1
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
249 "movq %%mm2, %%mm5 \n\t" // 1
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
250 PAVGB((%%REGa), %%mm2) // 11 /2
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
251 PAVGB((%0, %1, 2), %%mm2) // 211 /4
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
252 "movq %%mm2, %%mm3 \n\t" // 211 /4
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
253 "movq (%0), %%mm4 \n\t" // 1
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
254 PAVGB(%%mm4, %%mm3) // 4 211 /8
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
255 PAVGB(%%mm0, %%mm3) //642211 /16
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
256 "movq %%mm3, (%0) \n\t" // X
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
257 // mm1=2 mm2=3(211) mm4=1 mm5=5 mm6=0 mm7=9
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
258 "movq %%mm1, %%mm0 \n\t" // 1
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
259 PAVGB(%%mm6, %%mm0) //1 1 /2
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
260 "movq %%mm4, %%mm3 \n\t" // 1
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
261 PAVGB((%0,%1,2), %%mm3) // 1 1 /2
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
262 PAVGB((%%REGa,%1,2), %%mm5) // 11 /2
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
263 PAVGB((%%REGa), %%mm5) // 211 /4
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
264 PAVGB(%%mm5, %%mm3) // 2 2211 /8
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
265 PAVGB(%%mm0, %%mm3) //4242211 /16
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
266 "movq %%mm3, (%0,%1) \n\t" // X
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
267 // mm1=2 mm2=3(211) mm4=1 mm5=4(211) mm6=0 mm7=9
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
268 PAVGB(%%mm4, %%mm6) //11 /2
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
269 "movq (%%"REG_c"), %%mm0 \n\t" // 1
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
270 PAVGB((%%REGa, %1, 2), %%mm0) // 11/2
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
271 "movq %%mm0, %%mm3 \n\t" // 11/2
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
272 PAVGB(%%mm1, %%mm0) // 2 11/4
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
273 PAVGB(%%mm6, %%mm0) //222 11/8
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
274 PAVGB(%%mm2, %%mm0) //22242211/16
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
275 "movq (%0, %1, 2), %%mm2 \n\t" // 1
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
276 "movq %%mm0, (%0, %1, 2) \n\t" // X
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
277 // mm1=2 mm2=3 mm3=6(11) mm4=1 mm5=4(211) mm6=0(11) mm7=9
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
278 "movq (%%"REG_a", %1, 4), %%mm0 \n\t" // 1
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
279 PAVGB((%%REGc), %%mm0) // 11 /2
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
280 PAVGB(%%mm0, %%mm6) //11 11 /4
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
281 PAVGB(%%mm1, %%mm4) // 11 /2
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
282 PAVGB(%%mm2, %%mm1) // 11 /2
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
283 PAVGB(%%mm1, %%mm6) //1122 11 /8
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
284 PAVGB(%%mm5, %%mm6) //112242211 /16
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
285 "movq (%%"REG_a"), %%mm5 \n\t" // 1
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
286 "movq %%mm6, (%%"REG_a") \n\t" // X
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
287 // mm0=7(11) mm1=2(11) mm2=3 mm3=6(11) mm4=1(11) mm5=4 mm7=9
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
288 "movq (%%"REG_a", %1, 4), %%mm6 \n\t" // 1
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
289 PAVGB(%%mm7, %%mm6) // 11 /2
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
290 PAVGB(%%mm4, %%mm6) // 11 11 /4
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
291 PAVGB(%%mm3, %%mm6) // 11 2211 /8
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
292 PAVGB(%%mm5, %%mm2) // 11 /2
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
293 "movq (%0, %1, 4), %%mm4 \n\t" // 1
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
294 PAVGB(%%mm4, %%mm2) // 112 /4
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
295 PAVGB(%%mm2, %%mm6) // 112242211 /16
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
296 "movq %%mm6, (%0, %1, 4) \n\t" // X
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
297 // mm0=7(11) mm1=2(11) mm2=3(112) mm3=6(11) mm4=5 mm5=4 mm7=9
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
298 PAVGB(%%mm7, %%mm1) // 11 2 /4
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
299 PAVGB(%%mm4, %%mm5) // 11 /2
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
300 PAVGB(%%mm5, %%mm0) // 11 11 /4
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
301 "movq (%%"REG_a", %1, 2), %%mm6 \n\t" // 1
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
302 PAVGB(%%mm6, %%mm1) // 11 4 2 /8
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
303 PAVGB(%%mm0, %%mm1) // 11224222 /16
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
304 "movq %%mm1, (%%"REG_a", %1, 2) \n\t" // X
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
305 // mm2=3(112) mm3=6(11) mm4=5 mm5=4(11) mm6=6 mm7=9
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
306 PAVGB((%%REGc), %%mm2) // 112 4 /8
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
307 "movq (%%"REG_a", %1, 4), %%mm0 \n\t" // 1
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
308 PAVGB(%%mm0, %%mm6) // 1 1 /2
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
309 PAVGB(%%mm7, %%mm6) // 1 12 /4
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
310 PAVGB(%%mm2, %%mm6) // 1122424 /4
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
311 "movq %%mm6, (%%"REG_c") \n\t" // X
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
312 // mm0=8 mm3=6(11) mm4=5 mm5=4(11) mm7=9
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
313 PAVGB(%%mm7, %%mm5) // 11 2 /4
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
314 PAVGB(%%mm7, %%mm5) // 11 6 /8
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
315
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
316 PAVGB(%%mm3, %%mm0) // 112 /4
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
317 PAVGB(%%mm0, %%mm5) // 112246 /16
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
318 "movq %%mm5, (%%"REG_a", %1, 4) \n\t" // X
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
319 "sub %1, %0 \n\t"
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
320
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
321 :
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
322 : "r" (src), "r" ((long)stride), "m" (c->pQPb)
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
323 : "%"REG_a, "%"REG_c
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
324 );
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
325 #else
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
326 const int l1= stride;
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
327 const int l2= stride + l1;
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
328 const int l3= stride + l2;
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
329 const int l4= stride + l3;
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
330 const int l5= stride + l4;
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
331 const int l6= stride + l5;
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
332 const int l7= stride + l6;
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
333 const int l8= stride + l7;
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
334 const int l9= stride + l8;
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
335 int x;
111
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
336 src+= stride*3;
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
337 for(x=0; x<BLOCK_SIZE; x++)
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
338 {
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
339 const int first= ABS(src[0] - src[l1]) < c->QP ? src[0] : src[l1];
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
340 const int last= ABS(src[l8] - src[l9]) < c->QP ? src[l9] : src[l8];
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
341
2038
02b59a3c62cd faster c lowpass filter
michael
parents: 2037
diff changeset
342 int sums[10];
02b59a3c62cd faster c lowpass filter
michael
parents: 2037
diff changeset
343 sums[0] = 4*first + src[l1] + src[l2] + src[l3] + 4;
02b59a3c62cd faster c lowpass filter
michael
parents: 2037
diff changeset
344 sums[1] = sums[0] - first + src[l4];
02b59a3c62cd faster c lowpass filter
michael
parents: 2037
diff changeset
345 sums[2] = sums[1] - first + src[l5];
02b59a3c62cd faster c lowpass filter
michael
parents: 2037
diff changeset
346 sums[3] = sums[2] - first + src[l6];
02b59a3c62cd faster c lowpass filter
michael
parents: 2037
diff changeset
347 sums[4] = sums[3] - first + src[l7];
02b59a3c62cd faster c lowpass filter
michael
parents: 2037
diff changeset
348 sums[5] = sums[4] - src[l1] + src[l8];
02b59a3c62cd faster c lowpass filter
michael
parents: 2037
diff changeset
349 sums[6] = sums[5] - src[l2] + last;
02b59a3c62cd faster c lowpass filter
michael
parents: 2037
diff changeset
350 sums[7] = sums[6] - src[l3] + last;
02b59a3c62cd faster c lowpass filter
michael
parents: 2037
diff changeset
351 sums[8] = sums[7] - src[l4] + last;
02b59a3c62cd faster c lowpass filter
michael
parents: 2037
diff changeset
352 sums[9] = sums[8] - src[l5] + last;
02b59a3c62cd faster c lowpass filter
michael
parents: 2037
diff changeset
353
02b59a3c62cd faster c lowpass filter
michael
parents: 2037
diff changeset
354 src[l1]= (sums[0] + sums[2] + 2*src[l1])>>4;
02b59a3c62cd faster c lowpass filter
michael
parents: 2037
diff changeset
355 src[l2]= (sums[1] + sums[3] + 2*src[l2])>>4;
02b59a3c62cd faster c lowpass filter
michael
parents: 2037
diff changeset
356 src[l3]= (sums[2] + sums[4] + 2*src[l3])>>4;
02b59a3c62cd faster c lowpass filter
michael
parents: 2037
diff changeset
357 src[l4]= (sums[3] + sums[5] + 2*src[l4])>>4;
02b59a3c62cd faster c lowpass filter
michael
parents: 2037
diff changeset
358 src[l5]= (sums[4] + sums[6] + 2*src[l5])>>4;
02b59a3c62cd faster c lowpass filter
michael
parents: 2037
diff changeset
359 src[l6]= (sums[5] + sums[7] + 2*src[l6])>>4;
02b59a3c62cd faster c lowpass filter
michael
parents: 2037
diff changeset
360 src[l7]= (sums[6] + sums[8] + 2*src[l7])>>4;
02b59a3c62cd faster c lowpass filter
michael
parents: 2037
diff changeset
361 src[l8]= (sums[7] + sums[9] + 2*src[l8])>>4;
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
362
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
363 src++;
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
364 }
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
365 #endif
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
366 }
2036
6a6c678517b3 altivec optimizations and horizontal filter fix by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 2031
diff changeset
367 #endif //HAVE_ALTIVEC
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
368
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
369 #if 0
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
370 /**
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
371 * Experimental implementation of the filter (Algorithm 1) described in a paper from Ramkishor & Karandikar
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
372 * values are correctly clipped (MMX2)
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
373 * values are wraparound (C)
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
374 * conclusion: its fast, but introduces ugly horizontal patterns if there is a continious gradient
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
375 0 8 16 24
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
376 x = 8
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
377 x/2 = 4
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
378 x/8 = 1
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
379 1 12 12 23
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
380 */
169
20bcd5b70886 runtime cpu detection
michael
parents: 168
diff changeset
381 static inline void RENAME(vertRK1Filter)(uint8_t *src, int stride, int QP)
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
382 {
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
383 #if defined (HAVE_MMX2) || defined (HAVE_3DNOW)
111
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
384 src+= stride*3;
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
385 // FIXME rounding
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
386 asm volatile(
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
387 "pxor %%mm7, %%mm7 \n\t" // 0
210
c2b6d68a0671 mangle for win32 in postproc
atmos4
parents: 182
diff changeset
388 "movq "MANGLE(b80)", %%mm6 \n\t" // MIN_SIGNED_BYTE
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
389 "leal (%0, %1), %%"REG_a" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
390 "leal (%%"REG_a", %1, 4), %%"REG_c" \n\t"
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
391 // 0 1 2 3 4 5 6 7 8 9
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
392 // %0 eax eax+%1 eax+2%1 %0+4%1 ecx ecx+%1 ecx+2%1 %0+8%1 ecx+4%1
210
c2b6d68a0671 mangle for win32 in postproc
atmos4
parents: 182
diff changeset
393 "movq "MANGLE(pQPb)", %%mm0 \n\t" // QP,..., QP
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
394 "movq %%mm0, %%mm1 \n\t" // QP,..., QP
210
c2b6d68a0671 mangle for win32 in postproc
atmos4
parents: 182
diff changeset
395 "paddusb "MANGLE(b02)", %%mm0 \n\t"
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
396 "psrlw $2, %%mm0 \n\t"
210
c2b6d68a0671 mangle for win32 in postproc
atmos4
parents: 182
diff changeset
397 "pand "MANGLE(b3F)", %%mm0 \n\t" // QP/4,..., QP/4
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
398 "paddusb %%mm1, %%mm0 \n\t" // QP*1.25 ...
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
399 "movq (%0, %1, 4), %%mm2 \n\t" // line 4
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
400 "movq (%%"REG_c"), %%mm3 \n\t" // line 5
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
401 "movq %%mm2, %%mm4 \n\t" // line 4
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
402 "pcmpeqb %%mm5, %%mm5 \n\t" // -1
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
403 "pxor %%mm2, %%mm5 \n\t" // -line 4 - 1
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
404 PAVGB(%%mm3, %%mm5)
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
405 "paddb %%mm6, %%mm5 \n\t" // (l5-l4)/2
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
406 "psubusb %%mm3, %%mm4 \n\t"
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
407 "psubusb %%mm2, %%mm3 \n\t"
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
408 "por %%mm3, %%mm4 \n\t" // |l4 - l5|
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
409 "psubusb %%mm0, %%mm4 \n\t"
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
410 "pcmpeqb %%mm7, %%mm4 \n\t"
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
411 "pand %%mm4, %%mm5 \n\t" // d/2
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
412
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
413 // "paddb %%mm6, %%mm2 \n\t" // line 4 + 0x80
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
414 "paddb %%mm5, %%mm2 \n\t"
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
415 // "psubb %%mm6, %%mm2 \n\t"
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
416 "movq %%mm2, (%0,%1, 4) \n\t"
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
417
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
418 "movq (%%"REG_c"), %%mm2 \n\t"
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
419 // "paddb %%mm6, %%mm2 \n\t" // line 5 + 0x80
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
420 "psubb %%mm5, %%mm2 \n\t"
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
421 // "psubb %%mm6, %%mm2 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
422 "movq %%mm2, (%%"REG_c") \n\t"
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
423
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
424 "paddb %%mm6, %%mm5 \n\t"
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
425 "psrlw $2, %%mm5 \n\t"
210
c2b6d68a0671 mangle for win32 in postproc
atmos4
parents: 182
diff changeset
426 "pand "MANGLE(b3F)", %%mm5 \n\t"
c2b6d68a0671 mangle for win32 in postproc
atmos4
parents: 182
diff changeset
427 "psubb "MANGLE(b20)", %%mm5 \n\t" // (l5-l4)/8
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
428
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
429 "movq (%%"REG_a", %1, 2), %%mm2 \n\t"
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
430 "paddb %%mm6, %%mm2 \n\t" // line 3 + 0x80
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
431 "paddsb %%mm5, %%mm2 \n\t"
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
432 "psubb %%mm6, %%mm2 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
433 "movq %%mm2, (%%"REG_a", %1, 2) \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
434
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
435 "movq (%%"REG_c", %1), %%mm2 \n\t"
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
436 "paddb %%mm6, %%mm2 \n\t" // line 6 + 0x80
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
437 "psubsb %%mm5, %%mm2 \n\t"
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
438 "psubb %%mm6, %%mm2 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
439 "movq %%mm2, (%%"REG_c", %1) \n\t"
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
440
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
441 :
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
442 : "r" (src), "r" ((long)stride)
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
443 : "%"REG_a, "%"REG_c
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
444 );
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
445 #else
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
446 const int l1= stride;
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
447 const int l2= stride + l1;
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
448 const int l3= stride + l2;
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
449 const int l4= stride + l3;
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
450 const int l5= stride + l4;
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
451 const int l6= stride + l5;
129
be35346e27c1 fixed difference with -vo md5 between doVertDefFilter() C and MMX / MMX2 versions
michael
parents: 128
diff changeset
452 // const int l7= stride + l6;
be35346e27c1 fixed difference with -vo md5 between doVertDefFilter() C and MMX / MMX2 versions
michael
parents: 128
diff changeset
453 // const int l8= stride + l7;
be35346e27c1 fixed difference with -vo md5 between doVertDefFilter() C and MMX / MMX2 versions
michael
parents: 128
diff changeset
454 // const int l9= stride + l8;
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
455 int x;
141
626bfabff1f5 c speedup (x1, rk1 filters)
michael
parents: 140
diff changeset
456 const int QP15= QP + (QP>>2);
111
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
457 src+= stride*3;
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
458 for(x=0; x<BLOCK_SIZE; x++)
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
459 {
141
626bfabff1f5 c speedup (x1, rk1 filters)
michael
parents: 140
diff changeset
460 const int v = (src[x+l5] - src[x+l4]);
626bfabff1f5 c speedup (x1, rk1 filters)
michael
parents: 140
diff changeset
461 if(ABS(v) < QP15)
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
462 {
141
626bfabff1f5 c speedup (x1, rk1 filters)
michael
parents: 140
diff changeset
463 src[x+l3] +=v>>3;
626bfabff1f5 c speedup (x1, rk1 filters)
michael
parents: 140
diff changeset
464 src[x+l4] +=v>>1;
626bfabff1f5 c speedup (x1, rk1 filters)
michael
parents: 140
diff changeset
465 src[x+l5] -=v>>1;
626bfabff1f5 c speedup (x1, rk1 filters)
michael
parents: 140
diff changeset
466 src[x+l6] -=v>>3;
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
467
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
468 }
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
469 }
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
470
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
471 #endif
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
472 }
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
473 #endif
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
474
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
475 /**
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
476 * Experimental Filter 1
99
4f072fa99ccf fixed a rounding bug thing in the X1 Filter
michael
parents: 98
diff changeset
477 * will not damage linear gradients
4f072fa99ccf fixed a rounding bug thing in the X1 Filter
michael
parents: 98
diff changeset
478 * Flat blocks should look like they where passed through the (1,1,2,2,4,2,2,1,1) 9-Tap filter
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
479 * can only smooth blocks at the expected locations (it cant smooth them if they did move)
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
480 * MMX2 version does correct clipping C version doesnt
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
481 */
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
482 static inline void RENAME(vertX1Filter)(uint8_t *src, int stride, PPContext *co)
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
483 {
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
484 #if defined (HAVE_MMX2) || defined (HAVE_3DNOW)
111
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
485 src+= stride*3;
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
486
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
487 asm volatile(
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
488 "pxor %%mm7, %%mm7 \n\t" // 0
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
489 "lea (%0, %1), %%"REG_a" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
490 "lea (%%"REG_a", %1, 4), %%"REG_c" \n\t"
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
491 // 0 1 2 3 4 5 6 7 8 9
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
492 // %0 eax eax+%1 eax+2%1 %0+4%1 ecx ecx+%1 ecx+2%1 %0+8%1 ecx+4%1
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
493 "movq (%%"REG_a", %1, 2), %%mm0 \n\t" // line 3
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
494 "movq (%0, %1, 4), %%mm1 \n\t" // line 4
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
495 "movq %%mm1, %%mm2 \n\t" // line 4
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
496 "psubusb %%mm0, %%mm1 \n\t"
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
497 "psubusb %%mm2, %%mm0 \n\t"
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
498 "por %%mm1, %%mm0 \n\t" // |l2 - l3|
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
499 "movq (%%"REG_c"), %%mm3 \n\t" // line 5
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
500 "movq (%%"REG_c", %1), %%mm4 \n\t" // line 6
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
501 "movq %%mm3, %%mm5 \n\t" // line 5
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
502 "psubusb %%mm4, %%mm3 \n\t"
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
503 "psubusb %%mm5, %%mm4 \n\t"
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
504 "por %%mm4, %%mm3 \n\t" // |l5 - l6|
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
505 PAVGB(%%mm3, %%mm0) // (|l2 - l3| + |l5 - l6|)/2
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
506 "movq %%mm2, %%mm1 \n\t" // line 4
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
507 "psubusb %%mm5, %%mm2 \n\t"
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
508 "movq %%mm2, %%mm4 \n\t"
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
509 "pcmpeqb %%mm7, %%mm2 \n\t" // (l4 - l5) <= 0 ? -1 : 0
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
510 "psubusb %%mm1, %%mm5 \n\t"
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
511 "por %%mm5, %%mm4 \n\t" // |l4 - l5|
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
512 "psubusb %%mm0, %%mm4 \n\t" //d = MAX(0, |l4-l5| - (|l2-l3| + |l5-l6|)/2)
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
513 "movq %%mm4, %%mm3 \n\t" // d
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
514 "movq %2, %%mm0 \n\t"
334
3912b37ba121 x1 deblocking filter bugfix
michael
parents: 224
diff changeset
515 "paddusb %%mm0, %%mm0 \n\t"
3912b37ba121 x1 deblocking filter bugfix
michael
parents: 224
diff changeset
516 "psubusb %%mm0, %%mm4 \n\t"
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
517 "pcmpeqb %%mm7, %%mm4 \n\t" // d <= QP ? -1 : 0
210
c2b6d68a0671 mangle for win32 in postproc
atmos4
parents: 182
diff changeset
518 "psubusb "MANGLE(b01)", %%mm3 \n\t"
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
519 "pand %%mm4, %%mm3 \n\t" // d <= QP ? d : 0
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
520
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
521 PAVGB(%%mm7, %%mm3) // d/2
99
4f072fa99ccf fixed a rounding bug thing in the X1 Filter
michael
parents: 98
diff changeset
522 "movq %%mm3, %%mm1 \n\t" // d/2
4f072fa99ccf fixed a rounding bug thing in the X1 Filter
michael
parents: 98
diff changeset
523 PAVGB(%%mm7, %%mm3) // d/4
4f072fa99ccf fixed a rounding bug thing in the X1 Filter
michael
parents: 98
diff changeset
524 PAVGB(%%mm1, %%mm3) // 3*d/8
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
525
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
526 "movq (%0, %1, 4), %%mm0 \n\t" // line 4
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
527 "pxor %%mm2, %%mm0 \n\t" //(l4 - l5) <= 0 ? -l4-1 : l4
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
528 "psubusb %%mm3, %%mm0 \n\t"
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
529 "pxor %%mm2, %%mm0 \n\t"
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
530 "movq %%mm0, (%0, %1, 4) \n\t" // line 4
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
531
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
532 "movq (%%"REG_c"), %%mm0 \n\t" // line 5
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
533 "pxor %%mm2, %%mm0 \n\t" //(l4 - l5) <= 0 ? -l5-1 : l5
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
534 "paddusb %%mm3, %%mm0 \n\t"
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
535 "pxor %%mm2, %%mm0 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
536 "movq %%mm0, (%%"REG_c") \n\t" // line 5
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
537
99
4f072fa99ccf fixed a rounding bug thing in the X1 Filter
michael
parents: 98
diff changeset
538 PAVGB(%%mm7, %%mm1) // d/4
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
539
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
540 "movq (%%"REG_a", %1, 2), %%mm0 \n\t" // line 3
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
541 "pxor %%mm2, %%mm0 \n\t" //(l4 - l5) <= 0 ? -l4-1 : l4
99
4f072fa99ccf fixed a rounding bug thing in the X1 Filter
michael
parents: 98
diff changeset
542 "psubusb %%mm1, %%mm0 \n\t"
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
543 "pxor %%mm2, %%mm0 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
544 "movq %%mm0, (%%"REG_a", %1, 2) \n\t" // line 3
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
545
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
546 "movq (%%"REG_c", %1), %%mm0 \n\t" // line 6
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
547 "pxor %%mm2, %%mm0 \n\t" //(l4 - l5) <= 0 ? -l5-1 : l5
99
4f072fa99ccf fixed a rounding bug thing in the X1 Filter
michael
parents: 98
diff changeset
548 "paddusb %%mm1, %%mm0 \n\t"
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
549 "pxor %%mm2, %%mm0 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
550 "movq %%mm0, (%%"REG_c", %1) \n\t" // line 6
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
551
99
4f072fa99ccf fixed a rounding bug thing in the X1 Filter
michael
parents: 98
diff changeset
552 PAVGB(%%mm7, %%mm1) // d/8
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
553
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
554 "movq (%%"REG_a", %1), %%mm0 \n\t" // line 2
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
555 "pxor %%mm2, %%mm0 \n\t" //(l4 - l5) <= 0 ? -l2-1 : l2
99
4f072fa99ccf fixed a rounding bug thing in the X1 Filter
michael
parents: 98
diff changeset
556 "psubusb %%mm1, %%mm0 \n\t"
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
557 "pxor %%mm2, %%mm0 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
558 "movq %%mm0, (%%"REG_a", %1) \n\t" // line 2
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
559
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
560 "movq (%%"REG_c", %1, 2), %%mm0 \n\t" // line 7
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
561 "pxor %%mm2, %%mm0 \n\t" //(l4 - l5) <= 0 ? -l7-1 : l7
99
4f072fa99ccf fixed a rounding bug thing in the X1 Filter
michael
parents: 98
diff changeset
562 "paddusb %%mm1, %%mm0 \n\t"
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
563 "pxor %%mm2, %%mm0 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
564 "movq %%mm0, (%%"REG_c", %1, 2) \n\t" // line 7
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
565
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
566 :
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
567 : "r" (src), "r" ((long)stride), "m" (co->pQPb)
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
568 : "%"REG_a, "%"REG_c
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
569 );
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
570 #else
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
571
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
572 const int l1= stride;
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
573 const int l2= stride + l1;
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
574 const int l3= stride + l2;
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
575 const int l4= stride + l3;
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
576 const int l5= stride + l4;
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
577 const int l6= stride + l5;
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
578 const int l7= stride + l6;
129
be35346e27c1 fixed difference with -vo md5 between doVertDefFilter() C and MMX / MMX2 versions
michael
parents: 128
diff changeset
579 // const int l8= stride + l7;
be35346e27c1 fixed difference with -vo md5 between doVertDefFilter() C and MMX / MMX2 versions
michael
parents: 128
diff changeset
580 // const int l9= stride + l8;
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
581 int x;
111
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
582
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
583 src+= stride*3;
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
584 for(x=0; x<BLOCK_SIZE; x++)
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
585 {
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
586 int a= src[l3] - src[l4];
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
587 int b= src[l4] - src[l5];
99
4f072fa99ccf fixed a rounding bug thing in the X1 Filter
michael
parents: 98
diff changeset
588 int c= src[l5] - src[l6];
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
589
141
626bfabff1f5 c speedup (x1, rk1 filters)
michael
parents: 140
diff changeset
590 int d= ABS(b) - ((ABS(a) + ABS(c))>>1);
626bfabff1f5 c speedup (x1, rk1 filters)
michael
parents: 140
diff changeset
591 d= MAX(d, 0);
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
592
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
593 if(d < co->QP*2)
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
594 {
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
595 int v = d * SIGN(-b);
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
596
141
626bfabff1f5 c speedup (x1, rk1 filters)
michael
parents: 140
diff changeset
597 src[l2] +=v>>3;
626bfabff1f5 c speedup (x1, rk1 filters)
michael
parents: 140
diff changeset
598 src[l3] +=v>>2;
626bfabff1f5 c speedup (x1, rk1 filters)
michael
parents: 140
diff changeset
599 src[l4] +=(3*v)>>3;
626bfabff1f5 c speedup (x1, rk1 filters)
michael
parents: 140
diff changeset
600 src[l5] -=(3*v)>>3;
626bfabff1f5 c speedup (x1, rk1 filters)
michael
parents: 140
diff changeset
601 src[l6] -=v>>2;
626bfabff1f5 c speedup (x1, rk1 filters)
michael
parents: 140
diff changeset
602 src[l7] -=v>>3;
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
603
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
604 }
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
605 src++;
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
606 }
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
607 #endif
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
608 }
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
609
2036
6a6c678517b3 altivec optimizations and horizontal filter fix by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 2031
diff changeset
610 #ifndef HAVE_ALTIVEC
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
611 static inline void RENAME(doVertDefFilter)(uint8_t src[], int stride, PPContext *c)
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
612 {
163
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
613 #if defined (HAVE_MMX2) || defined (HAVE_3DNOW)
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
614 /*
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
615 uint8_t tmp[16];
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
616 const int l1= stride;
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
617 const int l2= stride + l1;
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
618 const int l3= stride + l2;
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
619 const int l4= (int)tmp - (int)src - stride*3;
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
620 const int l5= (int)tmp - (int)src - stride*3 + 8;
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
621 const int l6= stride*3 + l3;
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
622 const int l7= stride + l6;
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
623 const int l8= stride + l7;
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
624
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
625 memcpy(tmp, src+stride*7, 8);
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
626 memcpy(tmp+8, src+stride*8, 8);
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
627 */
111
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
628 src+= stride*4;
163
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
629 asm volatile(
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
630
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
631 #if 0 //sligtly more accurate and slightly slower
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
632 "pxor %%mm7, %%mm7 \n\t" // 0
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
633 "lea (%0, %1), %%"REG_a" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
634 "lea (%%"REG_a", %1, 4), %%"REG_c" \n\t"
163
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
635 // 0 1 2 3 4 5 6 7
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
636 // %0 %0+%1 %0+2%1 eax+2%1 %0+4%1 eax+4%1 ecx+%1 ecx+2%1
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
637 // %0 eax eax+%1 eax+2%1 %0+4%1 ecx ecx+%1 ecx+2%1
163
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
638
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
639
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
640 "movq (%0, %1, 2), %%mm0 \n\t" // l2
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
641 "movq (%0), %%mm1 \n\t" // l0
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
642 "movq %%mm0, %%mm2 \n\t" // l2
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
643 PAVGB(%%mm7, %%mm0) // ~l2/2
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
644 PAVGB(%%mm1, %%mm0) // ~(l2 + 2l0)/4
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
645 PAVGB(%%mm2, %%mm0) // ~(5l2 + 2l0)/8
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
646
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
647 "movq (%%"REG_a"), %%mm1 \n\t" // l1
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
648 "movq (%%"REG_a", %1, 2), %%mm3 \n\t" // l3
163
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
649 "movq %%mm1, %%mm4 \n\t" // l1
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
650 PAVGB(%%mm7, %%mm1) // ~l1/2
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
651 PAVGB(%%mm3, %%mm1) // ~(l1 + 2l3)/4
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
652 PAVGB(%%mm4, %%mm1) // ~(5l1 + 2l3)/8
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
653
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
654 "movq %%mm0, %%mm4 \n\t" // ~(5l2 + 2l0)/8
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
655 "psubusb %%mm1, %%mm0 \n\t"
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
656 "psubusb %%mm4, %%mm1 \n\t"
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
657 "por %%mm0, %%mm1 \n\t" // ~|2l0 - 5l1 + 5l2 - 2l3|/8
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
658 // mm1= |lenergy|, mm2= l2, mm3= l3, mm7=0
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
659
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
660 "movq (%0, %1, 4), %%mm0 \n\t" // l4
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
661 "movq %%mm0, %%mm4 \n\t" // l4
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
662 PAVGB(%%mm7, %%mm0) // ~l4/2
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
663 PAVGB(%%mm2, %%mm0) // ~(l4 + 2l2)/4
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
664 PAVGB(%%mm4, %%mm0) // ~(5l4 + 2l2)/8
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
665
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
666 "movq (%%"REG_c"), %%mm2 \n\t" // l5
163
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
667 "movq %%mm3, %%mm5 \n\t" // l3
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
668 PAVGB(%%mm7, %%mm3) // ~l3/2
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
669 PAVGB(%%mm2, %%mm3) // ~(l3 + 2l5)/4
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
670 PAVGB(%%mm5, %%mm3) // ~(5l3 + 2l5)/8
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
671
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
672 "movq %%mm0, %%mm6 \n\t" // ~(5l4 + 2l2)/8
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
673 "psubusb %%mm3, %%mm0 \n\t"
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
674 "psubusb %%mm6, %%mm3 \n\t"
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
675 "por %%mm0, %%mm3 \n\t" // ~|2l2 - 5l3 + 5l4 - 2l5|/8
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
676 "pcmpeqb %%mm7, %%mm0 \n\t" // SIGN(2l2 - 5l3 + 5l4 - 2l5)
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
677 // mm0= SIGN(menergy), mm1= |lenergy|, mm2= l5, mm3= |menergy|, mm4=l4, mm5= l3, mm7=0
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
678
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
679 "movq (%%"REG_c", %1), %%mm6 \n\t" // l6
163
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
680 "movq %%mm6, %%mm5 \n\t" // l6
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
681 PAVGB(%%mm7, %%mm6) // ~l6/2
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
682 PAVGB(%%mm4, %%mm6) // ~(l6 + 2l4)/4
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
683 PAVGB(%%mm5, %%mm6) // ~(5l6 + 2l4)/8
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
684
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
685 "movq (%%"REG_c", %1, 2), %%mm5 \n\t" // l7
163
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
686 "movq %%mm2, %%mm4 \n\t" // l5
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
687 PAVGB(%%mm7, %%mm2) // ~l5/2
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
688 PAVGB(%%mm5, %%mm2) // ~(l5 + 2l7)/4
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
689 PAVGB(%%mm4, %%mm2) // ~(5l5 + 2l7)/8
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
690
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
691 "movq %%mm6, %%mm4 \n\t" // ~(5l6 + 2l4)/8
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
692 "psubusb %%mm2, %%mm6 \n\t"
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
693 "psubusb %%mm4, %%mm2 \n\t"
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
694 "por %%mm6, %%mm2 \n\t" // ~|2l4 - 5l5 + 5l6 - 2l7|/8
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
695 // mm0= SIGN(menergy), mm1= |lenergy|/8, mm2= |renergy|/8, mm3= |menergy|/8, mm7=0
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
696
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
697
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
698 PMINUB(%%mm2, %%mm1, %%mm4) // MIN(|lenergy|,|renergy|)/8
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
699 "movq %2, %%mm4 \n\t" // QP //FIXME QP+1 ?
210
c2b6d68a0671 mangle for win32 in postproc
atmos4
parents: 182
diff changeset
700 "paddusb "MANGLE(b01)", %%mm4 \n\t"
163
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
701 "pcmpgtb %%mm3, %%mm4 \n\t" // |menergy|/8 < QP
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
702 "psubusb %%mm1, %%mm3 \n\t" // d=|menergy|/8-MIN(|lenergy|,|renergy|)/8
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
703 "pand %%mm4, %%mm3 \n\t"
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
704
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
705 "movq %%mm3, %%mm1 \n\t"
210
c2b6d68a0671 mangle for win32 in postproc
atmos4
parents: 182
diff changeset
706 // "psubusb "MANGLE(b01)", %%mm3 \n\t"
163
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
707 PAVGB(%%mm7, %%mm3)
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
708 PAVGB(%%mm7, %%mm3)
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
709 "paddusb %%mm1, %%mm3 \n\t"
210
c2b6d68a0671 mangle for win32 in postproc
atmos4
parents: 182
diff changeset
710 // "paddusb "MANGLE(b01)", %%mm3 \n\t"
163
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
711
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
712 "movq (%%"REG_a", %1, 2), %%mm6 \n\t" //l3
163
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
713 "movq (%0, %1, 4), %%mm5 \n\t" //l4
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
714 "movq (%0, %1, 4), %%mm4 \n\t" //l4
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
715 "psubusb %%mm6, %%mm5 \n\t"
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
716 "psubusb %%mm4, %%mm6 \n\t"
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
717 "por %%mm6, %%mm5 \n\t" // |l3-l4|
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
718 "pcmpeqb %%mm7, %%mm6 \n\t" // SIGN(l3-l4)
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
719 "pxor %%mm6, %%mm0 \n\t"
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
720 "pand %%mm0, %%mm3 \n\t"
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
721 PMINUB(%%mm5, %%mm3, %%mm0)
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
722
210
c2b6d68a0671 mangle for win32 in postproc
atmos4
parents: 182
diff changeset
723 "psubusb "MANGLE(b01)", %%mm3 \n\t"
163
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
724 PAVGB(%%mm7, %%mm3)
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
725
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
726 "movq (%%"REG_a", %1, 2), %%mm0 \n\t"
163
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
727 "movq (%0, %1, 4), %%mm2 \n\t"
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
728 "pxor %%mm6, %%mm0 \n\t"
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
729 "pxor %%mm6, %%mm2 \n\t"
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
730 "psubb %%mm3, %%mm0 \n\t"
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
731 "paddb %%mm3, %%mm2 \n\t"
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
732 "pxor %%mm6, %%mm0 \n\t"
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
733 "pxor %%mm6, %%mm2 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
734 "movq %%mm0, (%%"REG_a", %1, 2) \n\t"
163
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
735 "movq %%mm2, (%0, %1, 4) \n\t"
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
736 #endif
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
737
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
738 "lea (%0, %1), %%"REG_a" \n\t"
163
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
739 "pcmpeqb %%mm6, %%mm6 \n\t" // -1
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
740 // 0 1 2 3 4 5 6 7
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
741 // %0 %0+%1 %0+2%1 eax+2%1 %0+4%1 eax+4%1 ecx+%1 ecx+2%1
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
742 // %0 eax eax+%1 eax+2%1 %0+4%1 ecx ecx+%1 ecx+2%1
163
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
743
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
744
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
745 "movq (%%"REG_a", %1, 2), %%mm1 \n\t" // l3
163
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
746 "movq (%0, %1, 4), %%mm0 \n\t" // l4
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
747 "pxor %%mm6, %%mm1 \n\t" // -l3-1
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
748 PAVGB(%%mm1, %%mm0) // -q+128 = (l4-l3+256)/2
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
749 // mm1=-l3-1, mm0=128-q
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
750
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
751 "movq (%%"REG_a", %1, 4), %%mm2 \n\t" // l5
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
752 "movq (%%"REG_a", %1), %%mm3 \n\t" // l2
163
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
753 "pxor %%mm6, %%mm2 \n\t" // -l5-1
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
754 "movq %%mm2, %%mm5 \n\t" // -l5-1
210
c2b6d68a0671 mangle for win32 in postproc
atmos4
parents: 182
diff changeset
755 "movq "MANGLE(b80)", %%mm4 \n\t" // 128
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
756 "lea (%%"REG_a", %1, 4), %%"REG_c" \n\t"
163
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
757 PAVGB(%%mm3, %%mm2) // (l2-l5+256)/2
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
758 PAVGB(%%mm0, %%mm4) // ~(l4-l3)/4 + 128
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
759 PAVGB(%%mm2, %%mm4) // ~(l2-l5)/4 +(l4-l3)/8 + 128
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
760 PAVGB(%%mm0, %%mm4) // ~(l2-l5)/8 +5(l4-l3)/16 + 128
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
761 // mm1=-l3-1, mm0=128-q, mm3=l2, mm4=menergy/16 + 128, mm5= -l5-1
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
762
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
763 "movq (%%"REG_a"), %%mm2 \n\t" // l1
163
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
764 "pxor %%mm6, %%mm2 \n\t" // -l1-1
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
765 PAVGB(%%mm3, %%mm2) // (l2-l1+256)/2
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
766 PAVGB((%0), %%mm1) // (l0-l3+256)/2
210
c2b6d68a0671 mangle for win32 in postproc
atmos4
parents: 182
diff changeset
767 "movq "MANGLE(b80)", %%mm3 \n\t" // 128
163
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
768 PAVGB(%%mm2, %%mm3) // ~(l2-l1)/4 + 128
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
769 PAVGB(%%mm1, %%mm3) // ~(l0-l3)/4 +(l2-l1)/8 + 128
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
770 PAVGB(%%mm2, %%mm3) // ~(l0-l3)/8 +5(l2-l1)/16 + 128
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
771 // mm0=128-q, mm3=lenergy/16 + 128, mm4= menergy/16 + 128, mm5= -l5-1
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
772
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
773 PAVGB((%%REGc, %1), %%mm5) // (l6-l5+256)/2
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
774 "movq (%%"REG_c", %1, 2), %%mm1 \n\t" // l7
163
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
775 "pxor %%mm6, %%mm1 \n\t" // -l7-1
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
776 PAVGB((%0, %1, 4), %%mm1) // (l4-l7+256)/2
210
c2b6d68a0671 mangle for win32 in postproc
atmos4
parents: 182
diff changeset
777 "movq "MANGLE(b80)", %%mm2 \n\t" // 128
163
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
778 PAVGB(%%mm5, %%mm2) // ~(l6-l5)/4 + 128
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
779 PAVGB(%%mm1, %%mm2) // ~(l4-l7)/4 +(l6-l5)/8 + 128
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
780 PAVGB(%%mm5, %%mm2) // ~(l4-l7)/8 +5(l6-l5)/16 + 128
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
781 // mm0=128-q, mm2=renergy/16 + 128, mm3=lenergy/16 + 128, mm4= menergy/16 + 128
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
782
210
c2b6d68a0671 mangle for win32 in postproc
atmos4
parents: 182
diff changeset
783 "movq "MANGLE(b00)", %%mm1 \n\t" // 0
c2b6d68a0671 mangle for win32 in postproc
atmos4
parents: 182
diff changeset
784 "movq "MANGLE(b00)", %%mm5 \n\t" // 0
163
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
785 "psubb %%mm2, %%mm1 \n\t" // 128 - renergy/16
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
786 "psubb %%mm3, %%mm5 \n\t" // 128 - lenergy/16
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
787 PMAXUB(%%mm1, %%mm2) // 128 + |renergy/16|
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
788 PMAXUB(%%mm5, %%mm3) // 128 + |lenergy/16|
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
789 PMINUB(%%mm2, %%mm3, %%mm1) // 128 + MIN(|lenergy|,|renergy|)/16
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
790
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
791 // mm0=128-q, mm3=128 + MIN(|lenergy|,|renergy|)/16, mm4= menergy/16 + 128
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
792
210
c2b6d68a0671 mangle for win32 in postproc
atmos4
parents: 182
diff changeset
793 "movq "MANGLE(b00)", %%mm7 \n\t" // 0
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
794 "movq %2, %%mm2 \n\t" // QP
163
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
795 PAVGB(%%mm6, %%mm2) // 128 + QP/2
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
796 "psubb %%mm6, %%mm2 \n\t"
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
797
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
798 "movq %%mm4, %%mm1 \n\t"
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
799 "pcmpgtb %%mm7, %%mm1 \n\t" // SIGN(menergy)
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
800 "pxor %%mm1, %%mm4 \n\t"
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
801 "psubb %%mm1, %%mm4 \n\t" // 128 + |menergy|/16
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
802 "pcmpgtb %%mm4, %%mm2 \n\t" // |menergy|/16 < QP/2
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
803 "psubusb %%mm3, %%mm4 \n\t" //d=|menergy|/16 - MIN(|lenergy|,|renergy|)/16
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
804 // mm0=128-q, mm1= SIGN(menergy), mm2= |menergy|/16 < QP/2, mm4= d/16
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
805
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
806 "movq %%mm4, %%mm3 \n\t" // d
210
c2b6d68a0671 mangle for win32 in postproc
atmos4
parents: 182
diff changeset
807 "psubusb "MANGLE(b01)", %%mm4 \n\t"
163
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
808 PAVGB(%%mm7, %%mm4) // d/32
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
809 PAVGB(%%mm7, %%mm4) // (d + 32)/64
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
810 "paddb %%mm3, %%mm4 \n\t" // 5d/64
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
811 "pand %%mm2, %%mm4 \n\t"
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
812
210
c2b6d68a0671 mangle for win32 in postproc
atmos4
parents: 182
diff changeset
813 "movq "MANGLE(b80)", %%mm5 \n\t" // 128
163
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
814 "psubb %%mm0, %%mm5 \n\t" // q
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
815 "paddsb %%mm6, %%mm5 \n\t" // fix bad rounding
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
816 "pcmpgtb %%mm5, %%mm7 \n\t" // SIGN(q)
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
817 "pxor %%mm7, %%mm5 \n\t"
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
818
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
819 PMINUB(%%mm5, %%mm4, %%mm3) // MIN(|q|, 5d/64)
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
820 "pxor %%mm1, %%mm7 \n\t" // SIGN(d*q)
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
821
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
822 "pand %%mm7, %%mm4 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
823 "movq (%%"REG_a", %1, 2), %%mm0 \n\t"
163
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
824 "movq (%0, %1, 4), %%mm2 \n\t"
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
825 "pxor %%mm1, %%mm0 \n\t"
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
826 "pxor %%mm1, %%mm2 \n\t"
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
827 "paddb %%mm4, %%mm0 \n\t"
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
828 "psubb %%mm4, %%mm2 \n\t"
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
829 "pxor %%mm1, %%mm0 \n\t"
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
830 "pxor %%mm1, %%mm2 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
831 "movq %%mm0, (%%"REG_a", %1, 2) \n\t"
163
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
832 "movq %%mm2, (%0, %1, 4) \n\t"
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
833
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
834 :
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
835 : "r" (src), "r" ((long)stride), "m" (c->pQPb)
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
836 : "%"REG_a, "%"REG_c
163
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
837 );
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
838
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
839 /*
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
840 {
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
841 int x;
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
842 src-= stride;
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
843 for(x=0; x<BLOCK_SIZE; x++)
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
844 {
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
845 const int middleEnergy= 5*(src[l5] - src[l4]) + 2*(src[l3] - src[l6]);
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
846 if(ABS(middleEnergy)< 8*QP)
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
847 {
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
848 const int q=(src[l4] - src[l5])/2;
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
849 const int leftEnergy= 5*(src[l3] - src[l2]) + 2*(src[l1] - src[l4]);
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
850 const int rightEnergy= 5*(src[l7] - src[l6]) + 2*(src[l5] - src[l8]);
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
851
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
852 int d= ABS(middleEnergy) - MIN( ABS(leftEnergy), ABS(rightEnergy) );
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
853 d= MAX(d, 0);
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
854
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
855 d= (5*d + 32) >> 6;
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
856 d*= SIGN(-middleEnergy);
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
857
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
858 if(q>0)
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
859 {
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
860 d= d<0 ? 0 : d;
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
861 d= d>q ? q : d;
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
862 }
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
863 else
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
864 {
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
865 d= d>0 ? 0 : d;
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
866 d= d<q ? q : d;
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
867 }
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
868
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
869 src[l4]-= d;
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
870 src[l5]+= d;
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
871 }
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
872 src++;
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
873 }
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
874 src-=8;
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
875 for(x=0; x<8; x++)
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
876 {
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
877 int y;
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
878 for(y=4; y<6; y++)
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
879 {
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
880 int d= src[x+y*stride] - tmp[x+(y-4)*8];
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
881 int ad= ABS(d);
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
882 static int max=0;
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
883 static int sum=0;
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
884 static int num=0;
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
885 static int bias=0;
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
886
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
887 if(max<ad) max=ad;
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
888 sum+= ad>3 ? 1 : 0;
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
889 if(ad>3)
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
890 {
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
891 src[0] = src[7] = src[stride*7] = src[(stride+1)*7]=255;
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
892 }
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
893 if(y==4) bias+=d;
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
894 num++;
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
895 if(num%1000000 == 0)
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
896 {
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
897 printf(" %d %d %d %d\n", num, sum, max, bias);
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
898 }
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
899 }
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
900 }
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
901 }
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
902 */
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
903 #elif defined (HAVE_MMX)
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
904 src+= stride*4;
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
905 asm volatile(
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
906 "pxor %%mm7, %%mm7 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
907 "lea -40(%%"REG_SP"), %%"REG_c" \n\t" // make space for 4 8-byte vars
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
908 "and "ALIGN_MASK", %%"REG_c" \n\t" // align
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
909 // 0 1 2 3 4 5 6 7
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
910 // %0 %0+%1 %0+2%1 eax+2%1 %0+4%1 eax+4%1 edx+%1 edx+2%1
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
911 // %0 eax eax+%1 eax+2%1 %0+4%1 edx edx+%1 edx+2%1
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
912
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
913 "movq (%0), %%mm0 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
914 "movq %%mm0, %%mm1 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
915 "punpcklbw %%mm7, %%mm0 \n\t" // low part of line 0
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
916 "punpckhbw %%mm7, %%mm1 \n\t" // high part of line 0
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
917
810
8c8c3b6ff8c1 using fewer registers ... to workaround something
michael
parents: 804
diff changeset
918 "movq (%0, %1), %%mm2 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
919 "lea (%0, %1, 2), %%"REG_a" \n\t"
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
920 "movq %%mm2, %%mm3 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
921 "punpcklbw %%mm7, %%mm2 \n\t" // low part of line 1
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
922 "punpckhbw %%mm7, %%mm3 \n\t" // high part of line 1
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
923
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
924 "movq (%%"REG_a"), %%mm4 \n\t"
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
925 "movq %%mm4, %%mm5 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
926 "punpcklbw %%mm7, %%mm4 \n\t" // low part of line 2
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
927 "punpckhbw %%mm7, %%mm5 \n\t" // high part of line 2
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
928
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
929 "paddw %%mm0, %%mm0 \n\t" // 2L0
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
930 "paddw %%mm1, %%mm1 \n\t" // 2H0
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
931 "psubw %%mm4, %%mm2 \n\t" // L1 - L2
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
932 "psubw %%mm5, %%mm3 \n\t" // H1 - H2
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
933 "psubw %%mm2, %%mm0 \n\t" // 2L0 - L1 + L2
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
934 "psubw %%mm3, %%mm1 \n\t" // 2H0 - H1 + H2
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
935
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
936 "psllw $2, %%mm2 \n\t" // 4L1 - 4L2
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
937 "psllw $2, %%mm3 \n\t" // 4H1 - 4H2
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
938 "psubw %%mm2, %%mm0 \n\t" // 2L0 - 5L1 + 5L2
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
939 "psubw %%mm3, %%mm1 \n\t" // 2H0 - 5H1 + 5H2
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
940
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
941 "movq (%%"REG_a", %1), %%mm2 \n\t"
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
942 "movq %%mm2, %%mm3 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
943 "punpcklbw %%mm7, %%mm2 \n\t" // L3
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
944 "punpckhbw %%mm7, %%mm3 \n\t" // H3
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
945
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
946 "psubw %%mm2, %%mm0 \n\t" // 2L0 - 5L1 + 5L2 - L3
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
947 "psubw %%mm3, %%mm1 \n\t" // 2H0 - 5H1 + 5H2 - H3
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
948 "psubw %%mm2, %%mm0 \n\t" // 2L0 - 5L1 + 5L2 - 2L3
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
949 "psubw %%mm3, %%mm1 \n\t" // 2H0 - 5H1 + 5H2 - 2H3
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
950 "movq %%mm0, (%%"REG_c") \n\t" // 2L0 - 5L1 + 5L2 - 2L3
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
951 "movq %%mm1, 8(%%"REG_c") \n\t" // 2H0 - 5H1 + 5H2 - 2H3
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
952
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
953 "movq (%%"REG_a", %1, 2), %%mm0 \n\t"
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
954 "movq %%mm0, %%mm1 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
955 "punpcklbw %%mm7, %%mm0 \n\t" // L4
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
956 "punpckhbw %%mm7, %%mm1 \n\t" // H4
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
957
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
958 "psubw %%mm0, %%mm2 \n\t" // L3 - L4
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
959 "psubw %%mm1, %%mm3 \n\t" // H3 - H4
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
960 "movq %%mm2, 16(%%"REG_c") \n\t" // L3 - L4
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
961 "movq %%mm3, 24(%%"REG_c") \n\t" // H3 - H4
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
962 "paddw %%mm4, %%mm4 \n\t" // 2L2
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
963 "paddw %%mm5, %%mm5 \n\t" // 2H2
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
964 "psubw %%mm2, %%mm4 \n\t" // 2L2 - L3 + L4
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
965 "psubw %%mm3, %%mm5 \n\t" // 2H2 - H3 + H4
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
966
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
967 "lea (%%"REG_a", %1), %0 \n\t"
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
968 "psllw $2, %%mm2 \n\t" // 4L3 - 4L4
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
969 "psllw $2, %%mm3 \n\t" // 4H3 - 4H4
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
970 "psubw %%mm2, %%mm4 \n\t" // 2L2 - 5L3 + 5L4
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
971 "psubw %%mm3, %%mm5 \n\t" // 2H2 - 5H3 + 5H4
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
972 //50 opcodes so far
810
8c8c3b6ff8c1 using fewer registers ... to workaround something
michael
parents: 804
diff changeset
973 "movq (%0, %1, 2), %%mm2 \n\t"
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
974 "movq %%mm2, %%mm3 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
975 "punpcklbw %%mm7, %%mm2 \n\t" // L5
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
976 "punpckhbw %%mm7, %%mm3 \n\t" // H5
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
977 "psubw %%mm2, %%mm4 \n\t" // 2L2 - 5L3 + 5L4 - L5
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
978 "psubw %%mm3, %%mm5 \n\t" // 2H2 - 5H3 + 5H4 - H5
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
979 "psubw %%mm2, %%mm4 \n\t" // 2L2 - 5L3 + 5L4 - 2L5
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
980 "psubw %%mm3, %%mm5 \n\t" // 2H2 - 5H3 + 5H4 - 2H5
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
981
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
982 "movq (%%"REG_a", %1, 4), %%mm6 \n\t"
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
983 "punpcklbw %%mm7, %%mm6 \n\t" // L6
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
984 "psubw %%mm6, %%mm2 \n\t" // L5 - L6
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
985 "movq (%%"REG_a", %1, 4), %%mm6 \n\t"
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
986 "punpckhbw %%mm7, %%mm6 \n\t" // H6
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
987 "psubw %%mm6, %%mm3 \n\t" // H5 - H6
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
988
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
989 "paddw %%mm0, %%mm0 \n\t" // 2L4
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
990 "paddw %%mm1, %%mm1 \n\t" // 2H4
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
991 "psubw %%mm2, %%mm0 \n\t" // 2L4 - L5 + L6
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
992 "psubw %%mm3, %%mm1 \n\t" // 2H4 - H5 + H6
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
993
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
994 "psllw $2, %%mm2 \n\t" // 4L5 - 4L6
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
995 "psllw $2, %%mm3 \n\t" // 4H5 - 4H6
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
996 "psubw %%mm2, %%mm0 \n\t" // 2L4 - 5L5 + 5L6
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
997 "psubw %%mm3, %%mm1 \n\t" // 2H4 - 5H5 + 5H6
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
998
810
8c8c3b6ff8c1 using fewer registers ... to workaround something
michael
parents: 804
diff changeset
999 "movq (%0, %1, 4), %%mm2 \n\t"
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1000 "movq %%mm2, %%mm3 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1001 "punpcklbw %%mm7, %%mm2 \n\t" // L7
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1002 "punpckhbw %%mm7, %%mm3 \n\t" // H7
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1003
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1004 "paddw %%mm2, %%mm2 \n\t" // 2L7
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1005 "paddw %%mm3, %%mm3 \n\t" // 2H7
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1006 "psubw %%mm2, %%mm0 \n\t" // 2L4 - 5L5 + 5L6 - 2L7
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1007 "psubw %%mm3, %%mm1 \n\t" // 2H4 - 5H5 + 5H6 - 2H7
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1008
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1009 "movq (%%"REG_c"), %%mm2 \n\t" // 2L0 - 5L1 + 5L2 - 2L3
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1010 "movq 8(%%"REG_c"), %%mm3 \n\t" // 2H0 - 5H1 + 5H2 - 2H3
140
52ed0baddd56 minor speedup
michael
parents: 135
diff changeset
1011
52ed0baddd56 minor speedup
michael
parents: 135
diff changeset
1012 #ifdef HAVE_MMX2
52ed0baddd56 minor speedup
michael
parents: 135
diff changeset
1013 "movq %%mm7, %%mm6 \n\t" // 0
52ed0baddd56 minor speedup
michael
parents: 135
diff changeset
1014 "psubw %%mm0, %%mm6 \n\t"
52ed0baddd56 minor speedup
michael
parents: 135
diff changeset
1015 "pmaxsw %%mm6, %%mm0 \n\t" // |2L4 - 5L5 + 5L6 - 2L7|
52ed0baddd56 minor speedup
michael
parents: 135
diff changeset
1016 "movq %%mm7, %%mm6 \n\t" // 0
52ed0baddd56 minor speedup
michael
parents: 135
diff changeset
1017 "psubw %%mm1, %%mm6 \n\t"
52ed0baddd56 minor speedup
michael
parents: 135
diff changeset
1018 "pmaxsw %%mm6, %%mm1 \n\t" // |2H4 - 5H5 + 5H6 - 2H7|
52ed0baddd56 minor speedup
michael
parents: 135
diff changeset
1019 "movq %%mm7, %%mm6 \n\t" // 0
52ed0baddd56 minor speedup
michael
parents: 135
diff changeset
1020 "psubw %%mm2, %%mm6 \n\t"
52ed0baddd56 minor speedup
michael
parents: 135
diff changeset
1021 "pmaxsw %%mm6, %%mm2 \n\t" // |2L0 - 5L1 + 5L2 - 2L3|
52ed0baddd56 minor speedup
michael
parents: 135
diff changeset
1022 "movq %%mm7, %%mm6 \n\t" // 0
52ed0baddd56 minor speedup
michael
parents: 135
diff changeset
1023 "psubw %%mm3, %%mm6 \n\t"
52ed0baddd56 minor speedup
michael
parents: 135
diff changeset
1024 "pmaxsw %%mm6, %%mm3 \n\t" // |2H0 - 5H1 + 5H2 - 2H3|
52ed0baddd56 minor speedup
michael
parents: 135
diff changeset
1025 #else
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1026 "movq %%mm7, %%mm6 \n\t" // 0
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1027 "pcmpgtw %%mm0, %%mm6 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1028 "pxor %%mm6, %%mm0 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1029 "psubw %%mm6, %%mm0 \n\t" // |2L4 - 5L5 + 5L6 - 2L7|
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1030 "movq %%mm7, %%mm6 \n\t" // 0
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1031 "pcmpgtw %%mm1, %%mm6 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1032 "pxor %%mm6, %%mm1 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1033 "psubw %%mm6, %%mm1 \n\t" // |2H4 - 5H5 + 5H6 - 2H7|
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1034 "movq %%mm7, %%mm6 \n\t" // 0
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1035 "pcmpgtw %%mm2, %%mm6 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1036 "pxor %%mm6, %%mm2 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1037 "psubw %%mm6, %%mm2 \n\t" // |2L0 - 5L1 + 5L2 - 2L3|
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1038 "movq %%mm7, %%mm6 \n\t" // 0
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1039 "pcmpgtw %%mm3, %%mm6 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1040 "pxor %%mm6, %%mm3 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1041 "psubw %%mm6, %%mm3 \n\t" // |2H0 - 5H1 + 5H2 - 2H3|
140
52ed0baddd56 minor speedup
michael
parents: 135
diff changeset
1042 #endif
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1043
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1044 #ifdef HAVE_MMX2
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1045 "pminsw %%mm2, %%mm0 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1046 "pminsw %%mm3, %%mm1 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1047 #else
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1048 "movq %%mm0, %%mm6 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1049 "psubusw %%mm2, %%mm6 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1050 "psubw %%mm6, %%mm0 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1051 "movq %%mm1, %%mm6 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1052 "psubusw %%mm3, %%mm6 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1053 "psubw %%mm6, %%mm1 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1054 #endif
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1055
2039
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
1056 "movd %2, %%mm2 \n\t" // QP
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
1057 "punpcklbw %%mm7, %%mm2 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
1058
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1059 "movq %%mm7, %%mm6 \n\t" // 0
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1060 "pcmpgtw %%mm4, %%mm6 \n\t" // sign(2L2 - 5L3 + 5L4 - 2L5)
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1061 "pxor %%mm6, %%mm4 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1062 "psubw %%mm6, %%mm4 \n\t" // |2L2 - 5L3 + 5L4 - 2L5|
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1063 "pcmpgtw %%mm5, %%mm7 \n\t" // sign(2H2 - 5H3 + 5H4 - 2H5)
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1064 "pxor %%mm7, %%mm5 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1065 "psubw %%mm7, %%mm5 \n\t" // |2H2 - 5H3 + 5H4 - 2H5|
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1066 // 100 opcodes
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1067 "psllw $3, %%mm2 \n\t" // 8QP
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1068 "movq %%mm2, %%mm3 \n\t" // 8QP
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1069 "pcmpgtw %%mm4, %%mm2 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1070 "pcmpgtw %%mm5, %%mm3 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1071 "pand %%mm2, %%mm4 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1072 "pand %%mm3, %%mm5 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1073
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1074
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1075 "psubusw %%mm0, %%mm4 \n\t" // hd
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1076 "psubusw %%mm1, %%mm5 \n\t" // ld
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1077
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1078
211
f1074f0d4969 fix mangling with runtime cpu detection
atmos4
parents: 210
diff changeset
1079 "movq "MANGLE(w05)", %%mm2 \n\t" // 5
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1080 "pmullw %%mm2, %%mm4 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1081 "pmullw %%mm2, %%mm5 \n\t"
211
f1074f0d4969 fix mangling with runtime cpu detection
atmos4
parents: 210
diff changeset
1082 "movq "MANGLE(w20)", %%mm2 \n\t" // 32
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1083 "paddw %%mm2, %%mm4 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1084 "paddw %%mm2, %%mm5 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1085 "psrlw $6, %%mm4 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1086 "psrlw $6, %%mm5 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1087
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1088 "movq 16(%%"REG_c"), %%mm0 \n\t" // L3 - L4
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1089 "movq 24(%%"REG_c"), %%mm1 \n\t" // H3 - H4
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1090
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1091 "pxor %%mm2, %%mm2 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1092 "pxor %%mm3, %%mm3 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1093
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1094 "pcmpgtw %%mm0, %%mm2 \n\t" // sign (L3-L4)
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1095 "pcmpgtw %%mm1, %%mm3 \n\t" // sign (H3-H4)
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1096 "pxor %%mm2, %%mm0 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1097 "pxor %%mm3, %%mm1 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1098 "psubw %%mm2, %%mm0 \n\t" // |L3-L4|
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1099 "psubw %%mm3, %%mm1 \n\t" // |H3-H4|
129
be35346e27c1 fixed difference with -vo md5 between doVertDefFilter() C and MMX / MMX2 versions
michael
parents: 128
diff changeset
1100 "psrlw $1, %%mm0 \n\t" // |L3 - L4|/2
be35346e27c1 fixed difference with -vo md5 between doVertDefFilter() C and MMX / MMX2 versions
michael
parents: 128
diff changeset
1101 "psrlw $1, %%mm1 \n\t" // |H3 - H4|/2
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1102
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1103 "pxor %%mm6, %%mm2 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1104 "pxor %%mm7, %%mm3 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1105 "pand %%mm2, %%mm4 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1106 "pand %%mm3, %%mm5 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1107
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1108 #ifdef HAVE_MMX2
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1109 "pminsw %%mm0, %%mm4 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1110 "pminsw %%mm1, %%mm5 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1111 #else
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1112 "movq %%mm4, %%mm2 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1113 "psubusw %%mm0, %%mm2 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1114 "psubw %%mm2, %%mm4 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1115 "movq %%mm5, %%mm2 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1116 "psubusw %%mm1, %%mm2 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1117 "psubw %%mm2, %%mm5 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1118 #endif
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1119 "pxor %%mm6, %%mm4 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1120 "pxor %%mm7, %%mm5 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1121 "psubw %%mm6, %%mm4 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1122 "psubw %%mm7, %%mm5 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1123 "packsswb %%mm5, %%mm4 \n\t"
810
8c8c3b6ff8c1 using fewer registers ... to workaround something
michael
parents: 804
diff changeset
1124 "movq (%0), %%mm0 \n\t"
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1125 "paddb %%mm4, %%mm0 \n\t"
810
8c8c3b6ff8c1 using fewer registers ... to workaround something
michael
parents: 804
diff changeset
1126 "movq %%mm0, (%0) \n\t"
8c8c3b6ff8c1 using fewer registers ... to workaround something
michael
parents: 804
diff changeset
1127 "movq (%0, %1), %%mm0 \n\t"
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1128 "psubb %%mm4, %%mm0 \n\t"
810
8c8c3b6ff8c1 using fewer registers ... to workaround something
michael
parents: 804
diff changeset
1129 "movq %%mm0, (%0, %1) \n\t"
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1130
810
8c8c3b6ff8c1 using fewer registers ... to workaround something
michael
parents: 804
diff changeset
1131 : "+r" (src)
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1132 : "r" ((long)stride), "m" (c->pQPb)
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1133 : "%"REG_a, "%"REG_c
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1134 );
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1135 #else
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1136 const int l1= stride;
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1137 const int l2= stride + l1;
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1138 const int l3= stride + l2;
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1139 const int l4= stride + l3;
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1140 const int l5= stride + l4;
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1141 const int l6= stride + l5;
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1142 const int l7= stride + l6;
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1143 const int l8= stride + l7;
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1144 // const int l9= stride + l8;
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
1145 int x;
111
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
1146 src+= stride*3;
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
1147 for(x=0; x<BLOCK_SIZE; x++)
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1148 {
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1149 const int middleEnergy= 5*(src[l5] - src[l4]) + 2*(src[l3] - src[l6]);
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1150 if(ABS(middleEnergy) < 8*c->QP)
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1151 {
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1152 const int q=(src[l4] - src[l5])/2;
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1153 const int leftEnergy= 5*(src[l3] - src[l2]) + 2*(src[l1] - src[l4]);
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1154 const int rightEnergy= 5*(src[l7] - src[l6]) + 2*(src[l5] - src[l8]);
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1155
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1156 int d= ABS(middleEnergy) - MIN( ABS(leftEnergy), ABS(rightEnergy) );
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1157 d= MAX(d, 0);
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1158
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1159 d= (5*d + 32) >> 6;
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1160 d*= SIGN(-middleEnergy);
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1161
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1162 if(q>0)
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1163 {
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1164 d= d<0 ? 0 : d;
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1165 d= d>q ? q : d;
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1166 }
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1167 else
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1168 {
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1169 d= d>0 ? 0 : d;
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1170 d= d<q ? q : d;
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1171 }
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1172
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1173 src[l4]-= d;
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1174 src[l5]+= d;
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1175 }
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1176 src++;
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1177 }
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1178 #endif
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1179 }
2036
6a6c678517b3 altivec optimizations and horizontal filter fix by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 2031
diff changeset
1180 #endif //HAVE_ALTIVEC
6a6c678517b3 altivec optimizations and horizontal filter fix by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 2031
diff changeset
1181
6a6c678517b3 altivec optimizations and horizontal filter fix by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 2031
diff changeset
1182 #ifndef HAVE_ALTIVEC
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1183 static inline void RENAME(dering)(uint8_t src[], int stride, PPContext *c)
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1184 {
132
c4caf29acc1a 3dnow dering
michael
parents: 130
diff changeset
1185 #if defined (HAVE_MMX2) || defined (HAVE_3DNOW)
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1186 asm volatile(
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1187 "pxor %%mm6, %%mm6 \n\t"
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1188 "pcmpeqb %%mm7, %%mm7 \n\t"
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1189 "movq %2, %%mm0 \n\t"
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1190 "punpcklbw %%mm6, %%mm0 \n\t"
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1191 "psrlw $1, %%mm0 \n\t"
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1192 "psubw %%mm7, %%mm0 \n\t"
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1193 "packuswb %%mm0, %%mm0 \n\t"
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1194 "movq %%mm0, %3 \n\t"
130
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1195
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1196 "lea (%0, %1), %%"REG_a" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1197 "lea (%%"REG_a", %1, 4), %%"REG_d" \n\t"
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1198
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1199 // 0 1 2 3 4 5 6 7 8 9
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1200 // %0 eax eax+%1 eax+2%1 %0+4%1 edx edx+%1 edx+2%1 %0+8%1 edx+4%1
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1201
169
20bcd5b70886 runtime cpu detection
michael
parents: 168
diff changeset
1202 #undef FIND_MIN_MAX
132
c4caf29acc1a 3dnow dering
michael
parents: 130
diff changeset
1203 #ifdef HAVE_MMX2
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1204 #define REAL_FIND_MIN_MAX(addr)\
130
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1205 "movq " #addr ", %%mm0 \n\t"\
167
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1206 "pminub %%mm0, %%mm7 \n\t"\
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1207 "pmaxub %%mm0, %%mm6 \n\t"
132
c4caf29acc1a 3dnow dering
michael
parents: 130
diff changeset
1208 #else
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1209 #define REAL_FIND_MIN_MAX(addr)\
132
c4caf29acc1a 3dnow dering
michael
parents: 130
diff changeset
1210 "movq " #addr ", %%mm0 \n\t"\
167
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1211 "movq %%mm7, %%mm1 \n\t"\
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1212 "psubusb %%mm0, %%mm6 \n\t"\
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1213 "paddb %%mm0, %%mm6 \n\t"\
132
c4caf29acc1a 3dnow dering
michael
parents: 130
diff changeset
1214 "psubusb %%mm0, %%mm1 \n\t"\
167
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1215 "psubb %%mm1, %%mm7 \n\t"
132
c4caf29acc1a 3dnow dering
michael
parents: 130
diff changeset
1216 #endif
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1217 #define FIND_MIN_MAX(addr) REAL_FIND_MIN_MAX(addr)
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1218
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1219 FIND_MIN_MAX((%%REGa))
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1220 FIND_MIN_MAX((%%REGa, %1))
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1221 FIND_MIN_MAX((%%REGa, %1, 2))
130
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1222 FIND_MIN_MAX((%0, %1, 4))
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1223 FIND_MIN_MAX((%%REGd))
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1224 FIND_MIN_MAX((%%REGd, %1))
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1225 FIND_MIN_MAX((%%REGd, %1, 2))
130
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1226 FIND_MIN_MAX((%0, %1, 8))
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1227
167
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1228 "movq %%mm7, %%mm4 \n\t"
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1229 "psrlq $8, %%mm7 \n\t"
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1230 #ifdef HAVE_MMX2
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1231 "pminub %%mm4, %%mm7 \n\t" // min of pixels
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1232 "pshufw $0xF9, %%mm7, %%mm4 \n\t"
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1233 "pminub %%mm4, %%mm7 \n\t" // min of pixels
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1234 "pshufw $0xFE, %%mm7, %%mm4 \n\t"
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1235 "pminub %%mm4, %%mm7 \n\t"
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1236 #else
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1237 "movq %%mm7, %%mm1 \n\t"
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1238 "psubusb %%mm4, %%mm1 \n\t"
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1239 "psubb %%mm1, %%mm7 \n\t"
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1240 "movq %%mm7, %%mm4 \n\t"
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1241 "psrlq $16, %%mm7 \n\t"
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1242 "movq %%mm7, %%mm1 \n\t"
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1243 "psubusb %%mm4, %%mm1 \n\t"
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1244 "psubb %%mm1, %%mm7 \n\t"
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1245 "movq %%mm7, %%mm4 \n\t"
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1246 "psrlq $32, %%mm7 \n\t"
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1247 "movq %%mm7, %%mm1 \n\t"
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1248 "psubusb %%mm4, %%mm1 \n\t"
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1249 "psubb %%mm1, %%mm7 \n\t"
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1250 #endif
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1251
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1252
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1253 "movq %%mm6, %%mm4 \n\t"
129
be35346e27c1 fixed difference with -vo md5 between doVertDefFilter() C and MMX / MMX2 versions
michael
parents: 128
diff changeset
1254 "psrlq $8, %%mm6 \n\t"
132
c4caf29acc1a 3dnow dering
michael
parents: 130
diff changeset
1255 #ifdef HAVE_MMX2
167
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1256 "pmaxub %%mm4, %%mm6 \n\t" // max of pixels
129
be35346e27c1 fixed difference with -vo md5 between doVertDefFilter() C and MMX / MMX2 versions
michael
parents: 128
diff changeset
1257 "pshufw $0xF9, %%mm6, %%mm4 \n\t"
167
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1258 "pmaxub %%mm4, %%mm6 \n\t"
129
be35346e27c1 fixed difference with -vo md5 between doVertDefFilter() C and MMX / MMX2 versions
michael
parents: 128
diff changeset
1259 "pshufw $0xFE, %%mm6, %%mm4 \n\t"
167
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1260 "pmaxub %%mm4, %%mm6 \n\t"
129
be35346e27c1 fixed difference with -vo md5 between doVertDefFilter() C and MMX / MMX2 versions
michael
parents: 128
diff changeset
1261 #else
167
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1262 "psubusb %%mm4, %%mm6 \n\t"
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1263 "paddb %%mm4, %%mm6 \n\t"
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1264 "movq %%mm6, %%mm4 \n\t"
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1265 "psrlq $16, %%mm6 \n\t"
167
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1266 "psubusb %%mm4, %%mm6 \n\t"
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1267 "paddb %%mm4, %%mm6 \n\t"
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1268 "movq %%mm6, %%mm4 \n\t"
129
be35346e27c1 fixed difference with -vo md5 between doVertDefFilter() C and MMX / MMX2 versions
michael
parents: 128
diff changeset
1269 "psrlq $32, %%mm6 \n\t"
167
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1270 "psubusb %%mm4, %%mm6 \n\t"
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1271 "paddb %%mm4, %%mm6 \n\t"
129
be35346e27c1 fixed difference with -vo md5 between doVertDefFilter() C and MMX / MMX2 versions
michael
parents: 128
diff changeset
1272 #endif
167
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1273 "movq %%mm6, %%mm0 \n\t" // max
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1274 "psubb %%mm7, %%mm6 \n\t" // max - min
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1275 "movd %%mm6, %%ecx \n\t"
210
c2b6d68a0671 mangle for win32 in postproc
atmos4
parents: 182
diff changeset
1276 "cmpb "MANGLE(deringThreshold)", %%cl \n\t"
167
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1277 " jb 1f \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1278 "lea -24(%%"REG_SP"), %%"REG_c" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1279 "and "ALIGN_MASK", %%"REG_c" \n\t"
167
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1280 PAVGB(%%mm0, %%mm7) // a=(max + min)/2
129
be35346e27c1 fixed difference with -vo md5 between doVertDefFilter() C and MMX / MMX2 versions
michael
parents: 128
diff changeset
1281 "punpcklbw %%mm7, %%mm7 \n\t"
be35346e27c1 fixed difference with -vo md5 between doVertDefFilter() C and MMX / MMX2 versions
michael
parents: 128
diff changeset
1282 "punpcklbw %%mm7, %%mm7 \n\t"
be35346e27c1 fixed difference with -vo md5 between doVertDefFilter() C and MMX / MMX2 versions
michael
parents: 128
diff changeset
1283 "punpcklbw %%mm7, %%mm7 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1284 "movq %%mm7, (%%"REG_c") \n\t"
130
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1285
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1286 "movq (%0), %%mm0 \n\t" // L10
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1287 "movq %%mm0, %%mm1 \n\t" // L10
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1288 "movq %%mm0, %%mm2 \n\t" // L10
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1289 "psllq $8, %%mm1 \n\t"
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1290 "psrlq $8, %%mm2 \n\t"
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1291 "movd -4(%0), %%mm3 \n\t"
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1292 "movd 8(%0), %%mm4 \n\t"
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1293 "psrlq $24, %%mm3 \n\t"
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1294 "psllq $56, %%mm4 \n\t"
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1295 "por %%mm3, %%mm1 \n\t" // L00
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1296 "por %%mm4, %%mm2 \n\t" // L20
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1297 "movq %%mm1, %%mm3 \n\t" // L00
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1298 PAVGB(%%mm2, %%mm1) // (L20 + L00)/2
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1299 PAVGB(%%mm0, %%mm1) // (L20 + L00 + 2L10)/4
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1300 "psubusb %%mm7, %%mm0 \n\t"
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1301 "psubusb %%mm7, %%mm2 \n\t"
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1302 "psubusb %%mm7, %%mm3 \n\t"
210
c2b6d68a0671 mangle for win32 in postproc
atmos4
parents: 182
diff changeset
1303 "pcmpeqb "MANGLE(b00)", %%mm0 \n\t" // L10 > a ? 0 : -1
c2b6d68a0671 mangle for win32 in postproc
atmos4
parents: 182
diff changeset
1304 "pcmpeqb "MANGLE(b00)", %%mm2 \n\t" // L20 > a ? 0 : -1
c2b6d68a0671 mangle for win32 in postproc
atmos4
parents: 182
diff changeset
1305 "pcmpeqb "MANGLE(b00)", %%mm3 \n\t" // L00 > a ? 0 : -1
130
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1306 "paddb %%mm2, %%mm0 \n\t"
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1307 "paddb %%mm3, %%mm0 \n\t"
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1308
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1309 "movq (%%"REG_a"), %%mm2 \n\t" // L11
130
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1310 "movq %%mm2, %%mm3 \n\t" // L11
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1311 "movq %%mm2, %%mm4 \n\t" // L11
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1312 "psllq $8, %%mm3 \n\t"
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1313 "psrlq $8, %%mm4 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1314 "movd -4(%%"REG_a"), %%mm5 \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1315 "movd 8(%%"REG_a"), %%mm6 \n\t"
130
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1316 "psrlq $24, %%mm5 \n\t"
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1317 "psllq $56, %%mm6 \n\t"
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1318 "por %%mm5, %%mm3 \n\t" // L01
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1319 "por %%mm6, %%mm4 \n\t" // L21
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1320 "movq %%mm3, %%mm5 \n\t" // L01
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1321 PAVGB(%%mm4, %%mm3) // (L21 + L01)/2
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1322 PAVGB(%%mm2, %%mm3) // (L21 + L01 + 2L11)/4
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1323 "psubusb %%mm7, %%mm2 \n\t"
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1324 "psubusb %%mm7, %%mm4 \n\t"
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1325 "psubusb %%mm7, %%mm5 \n\t"
210
c2b6d68a0671 mangle for win32 in postproc
atmos4
parents: 182
diff changeset
1326 "pcmpeqb "MANGLE(b00)", %%mm2 \n\t" // L11 > a ? 0 : -1
c2b6d68a0671 mangle for win32 in postproc
atmos4
parents: 182
diff changeset
1327 "pcmpeqb "MANGLE(b00)", %%mm4 \n\t" // L21 > a ? 0 : -1
c2b6d68a0671 mangle for win32 in postproc
atmos4
parents: 182
diff changeset
1328 "pcmpeqb "MANGLE(b00)", %%mm5 \n\t" // L01 > a ? 0 : -1
130
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1329 "paddb %%mm4, %%mm2 \n\t"
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1330 "paddb %%mm5, %%mm2 \n\t"
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1331 // 0, 2, 3, 1
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1332 #define REAL_DERING_CORE(dst,src,ppsx,psx,sx,pplx,plx,lx,t0,t1) \
130
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1333 "movq " #src ", " #sx " \n\t" /* src[0] */\
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1334 "movq " #sx ", " #lx " \n\t" /* src[0] */\
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1335 "movq " #sx ", " #t0 " \n\t" /* src[0] */\
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1336 "psllq $8, " #lx " \n\t"\
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1337 "psrlq $8, " #t0 " \n\t"\
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1338 "movd -4" #src ", " #t1 " \n\t"\
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1339 "psrlq $24, " #t1 " \n\t"\
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1340 "por " #t1 ", " #lx " \n\t" /* src[-1] */\
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1341 "movd 8" #src ", " #t1 " \n\t"\
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1342 "psllq $56, " #t1 " \n\t"\
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1343 "por " #t1 ", " #t0 " \n\t" /* src[+1] */\
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1344 "movq " #lx ", " #t1 " \n\t" /* src[-1] */\
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1345 PAVGB(t0, lx) /* (src[-1] + src[+1])/2 */\
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1346 PAVGB(sx, lx) /* (src[-1] + 2src[0] + src[+1])/4 */\
135
5083d662ff85 faster dering
michael
parents: 134
diff changeset
1347 PAVGB(lx, pplx) \
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1348 "movq " #lx ", 8(%%"REG_c") \n\t"\
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1349 "movq (%%"REG_c"), " #lx " \n\t"\
140
52ed0baddd56 minor speedup
michael
parents: 135
diff changeset
1350 "psubusb " #lx ", " #t1 " \n\t"\
52ed0baddd56 minor speedup
michael
parents: 135
diff changeset
1351 "psubusb " #lx ", " #t0 " \n\t"\
52ed0baddd56 minor speedup
michael
parents: 135
diff changeset
1352 "psubusb " #lx ", " #sx " \n\t"\
210
c2b6d68a0671 mangle for win32 in postproc
atmos4
parents: 182
diff changeset
1353 "movq "MANGLE(b00)", " #lx " \n\t"\
140
52ed0baddd56 minor speedup
michael
parents: 135
diff changeset
1354 "pcmpeqb " #lx ", " #t1 " \n\t" /* src[-1] > a ? 0 : -1*/\
52ed0baddd56 minor speedup
michael
parents: 135
diff changeset
1355 "pcmpeqb " #lx ", " #t0 " \n\t" /* src[+1] > a ? 0 : -1*/\
52ed0baddd56 minor speedup
michael
parents: 135
diff changeset
1356 "pcmpeqb " #lx ", " #sx " \n\t" /* src[0] > a ? 0 : -1*/\
130
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1357 "paddb " #t1 ", " #t0 " \n\t"\
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1358 "paddb " #t0 ", " #sx " \n\t"\
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1359 \
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1360 PAVGB(plx, pplx) /* filtered */\
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1361 "movq " #dst ", " #t0 " \n\t" /* dst */\
134
2c469e390117 dering in c
michael
parents: 133
diff changeset
1362 "movq " #t0 ", " #t1 " \n\t" /* dst */\
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1363 "psubusb %3, " #t0 " \n\t"\
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1364 "paddusb %3, " #t1 " \n\t"\
134
2c469e390117 dering in c
michael
parents: 133
diff changeset
1365 PMAXUB(t0, pplx)\
2c469e390117 dering in c
michael
parents: 133
diff changeset
1366 PMINUB(t1, pplx, t0)\
130
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1367 "paddb " #sx ", " #ppsx " \n\t"\
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1368 "paddb " #psx ", " #ppsx " \n\t"\
210
c2b6d68a0671 mangle for win32 in postproc
atmos4
parents: 182
diff changeset
1369 "#paddb "MANGLE(b02)", " #ppsx " \n\t"\
c2b6d68a0671 mangle for win32 in postproc
atmos4
parents: 182
diff changeset
1370 "pand "MANGLE(b08)", " #ppsx " \n\t"\
140
52ed0baddd56 minor speedup
michael
parents: 135
diff changeset
1371 "pcmpeqb " #lx ", " #ppsx " \n\t"\
134
2c469e390117 dering in c
michael
parents: 133
diff changeset
1372 "pand " #ppsx ", " #pplx " \n\t"\
130
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1373 "pandn " #dst ", " #ppsx " \n\t"\
140
52ed0baddd56 minor speedup
michael
parents: 135
diff changeset
1374 "por " #pplx ", " #ppsx " \n\t"\
135
5083d662ff85 faster dering
michael
parents: 134
diff changeset
1375 "movq " #ppsx ", " #dst " \n\t"\
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1376 "movq 8(%%"REG_c"), " #lx " \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1377
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1378 #define DERING_CORE(dst,src,ppsx,psx,sx,pplx,plx,lx,t0,t1) \
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1379 REAL_DERING_CORE(dst,src,ppsx,psx,sx,pplx,plx,lx,t0,t1)
130
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1380 /*
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1381 0000000
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1382 1111111
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1383
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1384 1111110
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1385 1111101
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1386 1111100
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1387 1111011
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1388 1111010
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1389 1111001
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1390
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1391 1111000
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1392 1110111
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1393
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1394 */
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
1395 //DERING_CORE(dst,src ,ppsx ,psx ,sx ,pplx ,plx ,lx ,t0 ,t1)
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1396 DERING_CORE((%%REGa),(%%REGa, %1) ,%%mm0,%%mm2,%%mm4,%%mm1,%%mm3,%%mm5,%%mm6,%%mm7)
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1397 DERING_CORE((%%REGa, %1),(%%REGa, %1, 2) ,%%mm2,%%mm4,%%mm0,%%mm3,%%mm5,%%mm1,%%mm6,%%mm7)
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1398 DERING_CORE((%%REGa, %1, 2),(%0, %1, 4) ,%%mm4,%%mm0,%%mm2,%%mm5,%%mm1,%%mm3,%%mm6,%%mm7)
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1399 DERING_CORE((%0, %1, 4),(%%REGd) ,%%mm0,%%mm2,%%mm4,%%mm1,%%mm3,%%mm5,%%mm6,%%mm7)
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1400 DERING_CORE((%%REGd),(%%REGd, %1) ,%%mm2,%%mm4,%%mm0,%%mm3,%%mm5,%%mm1,%%mm6,%%mm7)
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1401 DERING_CORE((%%REGd, %1), (%%REGd, %1, 2),%%mm4,%%mm0,%%mm2,%%mm5,%%mm1,%%mm3,%%mm6,%%mm7)
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1402 DERING_CORE((%%REGd, %1, 2),(%0, %1, 8) ,%%mm0,%%mm2,%%mm4,%%mm1,%%mm3,%%mm5,%%mm6,%%mm7)
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1403 DERING_CORE((%0, %1, 8),(%%REGd, %1, 4) ,%%mm2,%%mm4,%%mm0,%%mm3,%%mm5,%%mm1,%%mm6,%%mm7)
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1404
167
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1405 "1: \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1406 : : "r" (src), "r" ((long)stride), "m" (c->pQPb), "m"(c->pQPb2)
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1407 : "%"REG_a, "%"REG_d, "%"REG_c
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1408 );
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1409 #else
134
2c469e390117 dering in c
michael
parents: 133
diff changeset
1410 int y;
2c469e390117 dering in c
michael
parents: 133
diff changeset
1411 int min=255;
2c469e390117 dering in c
michael
parents: 133
diff changeset
1412 int max=0;
2c469e390117 dering in c
michael
parents: 133
diff changeset
1413 int avg;
2c469e390117 dering in c
michael
parents: 133
diff changeset
1414 uint8_t *p;
2c469e390117 dering in c
michael
parents: 133
diff changeset
1415 int s[10];
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1416 const int QP2= c->QP/2 + 1;
134
2c469e390117 dering in c
michael
parents: 133
diff changeset
1417
2c469e390117 dering in c
michael
parents: 133
diff changeset
1418 for(y=1; y<9; y++)
2c469e390117 dering in c
michael
parents: 133
diff changeset
1419 {
2c469e390117 dering in c
michael
parents: 133
diff changeset
1420 int x;
2c469e390117 dering in c
michael
parents: 133
diff changeset
1421 p= src + stride*y;
2c469e390117 dering in c
michael
parents: 133
diff changeset
1422 for(x=1; x<9; x++)
2c469e390117 dering in c
michael
parents: 133
diff changeset
1423 {
2c469e390117 dering in c
michael
parents: 133
diff changeset
1424 p++;
2c469e390117 dering in c
michael
parents: 133
diff changeset
1425 if(*p > max) max= *p;
2c469e390117 dering in c
michael
parents: 133
diff changeset
1426 if(*p < min) min= *p;
2c469e390117 dering in c
michael
parents: 133
diff changeset
1427 }
2c469e390117 dering in c
michael
parents: 133
diff changeset
1428 }
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1429 avg= (min + max + 1)>>1;
134
2c469e390117 dering in c
michael
parents: 133
diff changeset
1430
167
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1431 if(max - min <deringThreshold) return;
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1432
134
2c469e390117 dering in c
michael
parents: 133
diff changeset
1433 for(y=0; y<10; y++)
2c469e390117 dering in c
michael
parents: 133
diff changeset
1434 {
2c469e390117 dering in c
michael
parents: 133
diff changeset
1435 int t = 0;
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1436
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1437 if(src[stride*y + 0] > avg) t+= 1;
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1438 if(src[stride*y + 1] > avg) t+= 2;
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1439 if(src[stride*y + 2] > avg) t+= 4;
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1440 if(src[stride*y + 3] > avg) t+= 8;
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1441 if(src[stride*y + 4] > avg) t+= 16;
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1442 if(src[stride*y + 5] > avg) t+= 32;
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1443 if(src[stride*y + 6] > avg) t+= 64;
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1444 if(src[stride*y + 7] > avg) t+= 128;
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1445 if(src[stride*y + 8] > avg) t+= 256;
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1446 if(src[stride*y + 9] > avg) t+= 512;
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1447
134
2c469e390117 dering in c
michael
parents: 133
diff changeset
1448 t |= (~t)<<16;
2c469e390117 dering in c
michael
parents: 133
diff changeset
1449 t &= (t<<1) & (t>>1);
2c469e390117 dering in c
michael
parents: 133
diff changeset
1450 s[y] = t;
2c469e390117 dering in c
michael
parents: 133
diff changeset
1451 }
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1452
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1453 for(y=1; y<9; y++)
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1454 {
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1455 int t = s[y-1] & s[y] & s[y+1];
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1456 t|= t>>16;
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1457 s[y-1]= t;
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1458 }
134
2c469e390117 dering in c
michael
parents: 133
diff changeset
1459
2c469e390117 dering in c
michael
parents: 133
diff changeset
1460 for(y=1; y<9; y++)
2c469e390117 dering in c
michael
parents: 133
diff changeset
1461 {
2c469e390117 dering in c
michael
parents: 133
diff changeset
1462 int x;
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1463 int t = s[y-1];
134
2c469e390117 dering in c
michael
parents: 133
diff changeset
1464
2c469e390117 dering in c
michael
parents: 133
diff changeset
1465 p= src + stride*y;
2c469e390117 dering in c
michael
parents: 133
diff changeset
1466 for(x=1; x<9; x++)
2c469e390117 dering in c
michael
parents: 133
diff changeset
1467 {
2c469e390117 dering in c
michael
parents: 133
diff changeset
1468 p++;
2c469e390117 dering in c
michael
parents: 133
diff changeset
1469 if(t & (1<<x))
2c469e390117 dering in c
michael
parents: 133
diff changeset
1470 {
2c469e390117 dering in c
michael
parents: 133
diff changeset
1471 int f= (*(p-stride-1)) + 2*(*(p-stride)) + (*(p-stride+1))
2c469e390117 dering in c
michael
parents: 133
diff changeset
1472 +2*(*(p -1)) + 4*(*p ) + 2*(*(p +1))
2c469e390117 dering in c
michael
parents: 133
diff changeset
1473 +(*(p+stride-1)) + 2*(*(p+stride)) + (*(p+stride+1));
2c469e390117 dering in c
michael
parents: 133
diff changeset
1474 f= (f + 8)>>4;
2c469e390117 dering in c
michael
parents: 133
diff changeset
1475
167
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1476 #ifdef DEBUG_DERING_THRESHOLD
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1477 asm volatile("emms\n\t":);
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1478 {
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1479 static long long numPixels=0;
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1480 if(x!=1 && x!=8 && y!=1 && y!=8) numPixels++;
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1481 // if((max-min)<20 || (max-min)*QP<200)
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1482 // if((max-min)*QP < 500)
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1483 // if(max-min<QP/2)
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1484 if(max-min < 20)
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1485 {
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1486 static int numSkiped=0;
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1487 static int errorSum=0;
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1488 static int worstQP=0;
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1489 static int worstRange=0;
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1490 static int worstDiff=0;
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1491 int diff= (f - *p);
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1492 int absDiff= ABS(diff);
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1493 int error= diff*diff;
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1494
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1495 if(x==1 || x==8 || y==1 || y==8) continue;
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1496
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1497 numSkiped++;
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1498 if(absDiff > worstDiff)
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1499 {
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1500 worstDiff= absDiff;
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1501 worstQP= QP;
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1502 worstRange= max-min;
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1503 }
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1504 errorSum+= error;
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1505
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1506 if(1024LL*1024LL*1024LL % numSkiped == 0)
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1507 {
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1508 printf( "sum:%1.3f, skip:%d, wQP:%d, "
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1509 "wRange:%d, wDiff:%d, relSkip:%1.3f\n",
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1510 (float)errorSum/numSkiped, numSkiped, worstQP, worstRange,
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1511 worstDiff, (float)numSkiped/numPixels);
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1512 }
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1513 }
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1514 }
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1515 #endif
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1516 if (*p + QP2 < f) *p= *p + QP2;
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1517 else if(*p - QP2 > f) *p= *p - QP2;
134
2c469e390117 dering in c
michael
parents: 133
diff changeset
1518 else *p=f;
2c469e390117 dering in c
michael
parents: 133
diff changeset
1519 }
2c469e390117 dering in c
michael
parents: 133
diff changeset
1520 }
2c469e390117 dering in c
michael
parents: 133
diff changeset
1521 }
167
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1522 #ifdef DEBUG_DERING_THRESHOLD
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1523 if(max-min < 20)
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1524 {
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1525 for(y=1; y<9; y++)
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1526 {
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1527 int x;
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1528 int t = 0;
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1529 p= src + stride*y;
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1530 for(x=1; x<9; x++)
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1531 {
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1532 p++;
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1533 *p = MIN(*p + 20, 255);
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1534 }
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1535 }
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1536 // src[0] = src[7]=src[stride*7]=src[stride*7 + 7]=255;
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1537 }
2d97f0157a79 faster dering
michael
parents: 166
diff changeset
1538 #endif
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1539 #endif
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1540 }
2036
6a6c678517b3 altivec optimizations and horizontal filter fix by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 2031
diff changeset
1541 #endif //HAVE_ALTIVEC
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
1542
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1543 /**
1109
3644e555a20a doxy / cosmetics
michaelni
parents: 1029
diff changeset
1544 * Deinterlaces the given block by linearly interpolating every second line.
142
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
1545 * will be called for every 8x8 block and can read & write from line 4-15
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
1546 * lines 0-3 have been passed through the deblock / dering filters allready, but can be read too
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
1547 * lines 4-12 will be read into the deblocking filter and should be deinterlaced
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1548 */
169
20bcd5b70886 runtime cpu detection
michael
parents: 168
diff changeset
1549 static inline void RENAME(deInterlaceInterpolateLinear)(uint8_t src[], int stride)
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1550 {
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1551 #if defined (HAVE_MMX2) || defined (HAVE_3DNOW)
142
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
1552 src+= 4*stride;
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1553 asm volatile(
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1554 "lea (%0, %1), %%"REG_a" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1555 "lea (%%"REG_a", %1, 4), %%"REG_c" \n\t"
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1556 // 0 1 2 3 4 5 6 7 8 9
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1557 // %0 eax eax+%1 eax+2%1 %0+4%1 ecx ecx+%1 ecx+2%1 %0+8%1 ecx+4%1
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1558
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1559 "movq (%0), %%mm0 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1560 "movq (%%"REG_a", %1), %%mm1 \n\t"
111
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
1561 PAVGB(%%mm1, %%mm0)
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1562 "movq %%mm0, (%%"REG_a") \n\t"
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1563 "movq (%0, %1, 4), %%mm0 \n\t"
111
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
1564 PAVGB(%%mm0, %%mm1)
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1565 "movq %%mm1, (%%"REG_a", %1, 2) \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1566 "movq (%%"REG_c", %1), %%mm1 \n\t"
111
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
1567 PAVGB(%%mm1, %%mm0)
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1568 "movq %%mm0, (%%"REG_c") \n\t"
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1569 "movq (%0, %1, 8), %%mm0 \n\t"
111
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
1570 PAVGB(%%mm0, %%mm1)
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1571 "movq %%mm1, (%%"REG_c", %1, 2) \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1572
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1573 : : "r" (src), "r" ((long)stride)
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1574 : "%"REG_a, "%"REG_c
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1575 );
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1576 #else
1158
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1577 int a, b, x;
142
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
1578 src+= 4*stride;
1158
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1579
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1580 for(x=0; x<2; x++){
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1581 a= *(uint32_t*)&src[stride*0];
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1582 b= *(uint32_t*)&src[stride*2];
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1583 *(uint32_t*)&src[stride*1]= (a|b) - (((a^b)&0xFEFEFEFEUL)>>1);
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1584 a= *(uint32_t*)&src[stride*4];
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1585 *(uint32_t*)&src[stride*3]= (a|b) - (((a^b)&0xFEFEFEFEUL)>>1);
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1586 b= *(uint32_t*)&src[stride*6];
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1587 *(uint32_t*)&src[stride*5]= (a|b) - (((a^b)&0xFEFEFEFEUL)>>1);
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1588 a= *(uint32_t*)&src[stride*8];
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1589 *(uint32_t*)&src[stride*7]= (a|b) - (((a^b)&0xFEFEFEFEUL)>>1);
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1590 src += 4;
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1591 }
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1592 #endif
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1593 }
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1594
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1595 /**
1109
3644e555a20a doxy / cosmetics
michaelni
parents: 1029
diff changeset
1596 * Deinterlaces the given block by cubic interpolating every second line.
142
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
1597 * will be called for every 8x8 block and can read & write from line 4-15
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
1598 * lines 0-3 have been passed through the deblock / dering filters allready, but can be read too
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
1599 * lines 4-12 will be read into the deblocking filter and should be deinterlaced
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
1600 * this filter will read lines 3-15 and write 7-13
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1601 */
169
20bcd5b70886 runtime cpu detection
michael
parents: 168
diff changeset
1602 static inline void RENAME(deInterlaceInterpolateCubic)(uint8_t src[], int stride)
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1603 {
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1604 #if defined (HAVE_MMX2) || defined (HAVE_3DNOW)
142
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
1605 src+= stride*3;
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1606 asm volatile(
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1607 "lea (%0, %1), %%"REG_a" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1608 "lea (%%"REG_a", %1, 4), %%"REG_d" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1609 "lea (%%"REG_d", %1, 4), %%"REG_c" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1610 "add %1, %%"REG_c" \n\t"
111
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
1611 "pxor %%mm7, %%mm7 \n\t"
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
1612 // 0 1 2 3 4 5 6 7 8 9 10
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1613 // %0 eax eax+%1 eax+2%1 %0+4%1 edx edx+%1 edx+2%1 %0+8%1 edx+4%1 ecx
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1614
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1615 #define REAL_DEINT_CUBIC(a,b,c,d,e)\
111
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
1616 "movq " #a ", %%mm0 \n\t"\
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
1617 "movq " #b ", %%mm1 \n\t"\
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
1618 "movq " #d ", %%mm2 \n\t"\
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
1619 "movq " #e ", %%mm3 \n\t"\
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
1620 PAVGB(%%mm2, %%mm1) /* (b+d) /2 */\
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
1621 PAVGB(%%mm3, %%mm0) /* a(a+e) /2 */\
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
1622 "movq %%mm0, %%mm2 \n\t"\
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
1623 "punpcklbw %%mm7, %%mm0 \n\t"\
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
1624 "punpckhbw %%mm7, %%mm2 \n\t"\
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
1625 "movq %%mm1, %%mm3 \n\t"\
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
1626 "punpcklbw %%mm7, %%mm1 \n\t"\
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
1627 "punpckhbw %%mm7, %%mm3 \n\t"\
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
1628 "psubw %%mm1, %%mm0 \n\t" /* L(a+e - (b+d))/2 */\
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
1629 "psubw %%mm3, %%mm2 \n\t" /* H(a+e - (b+d))/2 */\
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
1630 "psraw $3, %%mm0 \n\t" /* L(a+e - (b+d))/16 */\
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
1631 "psraw $3, %%mm2 \n\t" /* H(a+e - (b+d))/16 */\
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
1632 "psubw %%mm0, %%mm1 \n\t" /* L(9b + 9d - a - e)/16 */\
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
1633 "psubw %%mm2, %%mm3 \n\t" /* H(9b + 9d - a - e)/16 */\
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
1634 "packuswb %%mm3, %%mm1 \n\t"\
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
1635 "movq %%mm1, " #c " \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1636 #define DEINT_CUBIC(a,b,c,d,e) REAL_DEINT_CUBIC(a,b,c,d,e)
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1637
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1638 DEINT_CUBIC((%0), (%%REGa, %1), (%%REGa, %1, 2), (%0, %1, 4), (%%REGd, %1))
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1639 DEINT_CUBIC((%%REGa, %1), (%0, %1, 4), (%%REGd), (%%REGd, %1), (%0, %1, 8))
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1640 DEINT_CUBIC((%0, %1, 4), (%%REGd, %1), (%%REGd, %1, 2), (%0, %1, 8), (%%REGc))
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1641 DEINT_CUBIC((%%REGd, %1), (%0, %1, 8), (%%REGd, %1, 4), (%%REGc), (%%REGc, %1, 2))
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1642
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1643 : : "r" (src), "r" ((long)stride)
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1644 : "%"REG_a, "%"REG_d, "%"REG_c
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1645 );
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1646 #else
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1647 int x;
142
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
1648 src+= stride*3;
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1649 for(x=0; x<8; x++)
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1650 {
1157
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1651 src[stride*3] = CLIP((-src[0] + 9*src[stride*2] + 9*src[stride*4] - src[stride*6])>>4);
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1652 src[stride*5] = CLIP((-src[stride*2] + 9*src[stride*4] + 9*src[stride*6] - src[stride*8])>>4);
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1653 src[stride*7] = CLIP((-src[stride*4] + 9*src[stride*6] + 9*src[stride*8] - src[stride*10])>>4);
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1654 src[stride*9] = CLIP((-src[stride*6] + 9*src[stride*8] + 9*src[stride*10] - src[stride*12])>>4);
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1655 src++;
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1656 }
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1657 #endif
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1658 }
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1659
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1660 /**
1109
3644e555a20a doxy / cosmetics
michaelni
parents: 1029
diff changeset
1661 * Deinterlaces the given block by filtering every second line with a (-1 4 2 4 -1) filter.
142
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
1662 * will be called for every 8x8 block and can read & write from line 4-15
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
1663 * lines 0-3 have been passed through the deblock / dering filters allready, but can be read too
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
1664 * lines 4-12 will be read into the deblocking filter and should be deinterlaced
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1665 * this filter will read lines 4-13 and write 5-11
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1666 */
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1667 static inline void RENAME(deInterlaceFF)(uint8_t src[], int stride, uint8_t *tmp)
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1668 {
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1669 #if defined (HAVE_MMX2) || defined (HAVE_3DNOW)
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1670 src+= stride*4;
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1671 asm volatile(
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1672 "lea (%0, %1), %%"REG_a" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1673 "lea (%%"REG_a", %1, 4), %%"REG_d" \n\t"
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1674 "pxor %%mm7, %%mm7 \n\t"
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1675 "movq (%2), %%mm0 \n\t"
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1676 // 0 1 2 3 4 5 6 7 8 9 10
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1677 // %0 eax eax+%1 eax+2%1 %0+4%1 edx edx+%1 edx+2%1 %0+8%1 edx+4%1 ecx
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1678
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1679 #define REAL_DEINT_FF(a,b,c,d)\
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1680 "movq " #a ", %%mm1 \n\t"\
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1681 "movq " #b ", %%mm2 \n\t"\
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1682 "movq " #c ", %%mm3 \n\t"\
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1683 "movq " #d ", %%mm4 \n\t"\
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1684 PAVGB(%%mm3, %%mm1) \
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1685 PAVGB(%%mm4, %%mm0) \
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1686 "movq %%mm0, %%mm3 \n\t"\
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1687 "punpcklbw %%mm7, %%mm0 \n\t"\
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1688 "punpckhbw %%mm7, %%mm3 \n\t"\
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1689 "movq %%mm1, %%mm4 \n\t"\
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1690 "punpcklbw %%mm7, %%mm1 \n\t"\
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1691 "punpckhbw %%mm7, %%mm4 \n\t"\
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1692 "psllw $2, %%mm1 \n\t"\
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1693 "psllw $2, %%mm4 \n\t"\
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1694 "psubw %%mm0, %%mm1 \n\t"\
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1695 "psubw %%mm3, %%mm4 \n\t"\
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1696 "movq %%mm2, %%mm5 \n\t"\
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1697 "movq %%mm2, %%mm0 \n\t"\
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1698 "punpcklbw %%mm7, %%mm2 \n\t"\
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1699 "punpckhbw %%mm7, %%mm5 \n\t"\
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1700 "paddw %%mm2, %%mm1 \n\t"\
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1701 "paddw %%mm5, %%mm4 \n\t"\
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1702 "psraw $2, %%mm1 \n\t"\
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1703 "psraw $2, %%mm4 \n\t"\
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1704 "packuswb %%mm4, %%mm1 \n\t"\
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1705 "movq %%mm1, " #b " \n\t"\
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1706
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1707 #define DEINT_FF(a,b,c,d) REAL_DEINT_FF(a,b,c,d)
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1708
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1709 DEINT_FF((%0) , (%%REGa) , (%%REGa, %1), (%%REGa, %1, 2))
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1710 DEINT_FF((%%REGa, %1), (%%REGa, %1, 2), (%0, %1, 4), (%%REGd) )
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1711 DEINT_FF((%0, %1, 4), (%%REGd) , (%%REGd, %1), (%%REGd, %1, 2))
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1712 DEINT_FF((%%REGd, %1), (%%REGd, %1, 2), (%0, %1, 8), (%%REGd, %1, 4))
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1713
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1714 "movq %%mm0, (%2) \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1715 : : "r" (src), "r" ((long)stride), "r"(tmp)
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1716 : "%"REG_a, "%"REG_d
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1717 );
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1718 #else
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1719 int x;
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1720 src+= stride*4;
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1721 for(x=0; x<8; x++)
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1722 {
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1723 int t1= tmp[x];
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1724 int t2= src[stride*1];
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1725
1157
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1726 src[stride*1]= CLIP((-t1 + 4*src[stride*0] + 2*t2 + 4*src[stride*2] - src[stride*3] + 4)>>3);
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1727 t1= src[stride*4];
1157
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1728 src[stride*3]= CLIP((-t2 + 4*src[stride*2] + 2*t1 + 4*src[stride*4] - src[stride*5] + 4)>>3);
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1729 t2= src[stride*6];
1157
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1730 src[stride*5]= CLIP((-t1 + 4*src[stride*4] + 2*t2 + 4*src[stride*6] - src[stride*7] + 4)>>3);
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1731 t1= src[stride*8];
1157
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1732 src[stride*7]= CLIP((-t2 + 4*src[stride*6] + 2*t1 + 4*src[stride*8] - src[stride*9] + 4)>>3);
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1733 tmp[x]= t1;
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1734
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1735 src++;
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1736 }
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1737 #endif
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1738 }
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1739
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1740 /**
1157
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1741 * Deinterlaces the given block by filtering every line with a (-1 2 6 2 -1) filter.
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1742 * will be called for every 8x8 block and can read & write from line 4-15
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1743 * lines 0-3 have been passed through the deblock / dering filters allready, but can be read too
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1744 * lines 4-12 will be read into the deblocking filter and should be deinterlaced
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1745 * this filter will read lines 4-13 and write 4-11
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1746 */
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1747 static inline void RENAME(deInterlaceL5)(uint8_t src[], int stride, uint8_t *tmp, uint8_t *tmp2)
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1748 {
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1749 #if defined (HAVE_MMX2) || defined (HAVE_3DNOW)
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1750 src+= stride*4;
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1751 asm volatile(
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1752 "lea (%0, %1), %%"REG_a" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1753 "lea (%%"REG_a", %1, 4), %%"REG_d" \n\t"
1157
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1754 "pxor %%mm7, %%mm7 \n\t"
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1755 "movq (%2), %%mm0 \n\t"
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1756 "movq (%3), %%mm1 \n\t"
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1757 // 0 1 2 3 4 5 6 7 8 9 10
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1758 // %0 eax eax+%1 eax+2%1 %0+4%1 edx edx+%1 edx+2%1 %0+8%1 edx+4%1 ecx
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1759
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1760 #define REAL_DEINT_L5(t1,t2,a,b,c)\
1157
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1761 "movq " #a ", %%mm2 \n\t"\
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1762 "movq " #b ", %%mm3 \n\t"\
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1763 "movq " #c ", %%mm4 \n\t"\
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1764 PAVGB(t2, %%mm3) \
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1765 PAVGB(t1, %%mm4) \
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1766 "movq %%mm2, %%mm5 \n\t"\
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1767 "movq %%mm2, " #t1 " \n\t"\
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1768 "punpcklbw %%mm7, %%mm2 \n\t"\
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1769 "punpckhbw %%mm7, %%mm5 \n\t"\
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1770 "movq %%mm2, %%mm6 \n\t"\
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1771 "paddw %%mm2, %%mm2 \n\t"\
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1772 "paddw %%mm6, %%mm2 \n\t"\
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1773 "movq %%mm5, %%mm6 \n\t"\
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1774 "paddw %%mm5, %%mm5 \n\t"\
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1775 "paddw %%mm6, %%mm5 \n\t"\
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1776 "movq %%mm3, %%mm6 \n\t"\
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1777 "punpcklbw %%mm7, %%mm3 \n\t"\
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1778 "punpckhbw %%mm7, %%mm6 \n\t"\
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1779 "paddw %%mm3, %%mm3 \n\t"\
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1780 "paddw %%mm6, %%mm6 \n\t"\
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1781 "paddw %%mm3, %%mm2 \n\t"\
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1782 "paddw %%mm6, %%mm5 \n\t"\
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1783 "movq %%mm4, %%mm6 \n\t"\
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1784 "punpcklbw %%mm7, %%mm4 \n\t"\
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1785 "punpckhbw %%mm7, %%mm6 \n\t"\
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1786 "psubw %%mm4, %%mm2 \n\t"\
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1787 "psubw %%mm6, %%mm5 \n\t"\
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1788 "psraw $2, %%mm2 \n\t"\
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1789 "psraw $2, %%mm5 \n\t"\
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1790 "packuswb %%mm5, %%mm2 \n\t"\
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1791 "movq %%mm2, " #a " \n\t"\
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1792
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1793 #define DEINT_L5(t1,t2,a,b,c) REAL_DEINT_L5(t1,t2,a,b,c)
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1794
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1795 DEINT_L5(%%mm0, %%mm1, (%0) , (%%REGa) , (%%REGa, %1) )
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1796 DEINT_L5(%%mm1, %%mm0, (%%REGa) , (%%REGa, %1) , (%%REGa, %1, 2))
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1797 DEINT_L5(%%mm0, %%mm1, (%%REGa, %1) , (%%REGa, %1, 2), (%0, %1, 4) )
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1798 DEINT_L5(%%mm1, %%mm0, (%%REGa, %1, 2), (%0, %1, 4) , (%%REGd) )
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1799 DEINT_L5(%%mm0, %%mm1, (%0, %1, 4) , (%%REGd) , (%%REGd, %1) )
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1800 DEINT_L5(%%mm1, %%mm0, (%%REGd) , (%%REGd, %1) , (%%REGd, %1, 2))
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1801 DEINT_L5(%%mm0, %%mm1, (%%REGd, %1) , (%%REGd, %1, 2), (%0, %1, 8) )
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1802 DEINT_L5(%%mm1, %%mm0, (%%REGd, %1, 2), (%0, %1, 8) , (%%REGd, %1, 4))
1157
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1803
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1804 "movq %%mm0, (%2) \n\t"
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1805 "movq %%mm1, (%3) \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1806 : : "r" (src), "r" ((long)stride), "r"(tmp), "r"(tmp2)
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1807 : "%"REG_a, "%"REG_d
1157
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1808 );
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1809 #else
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1810 int x;
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1811 src+= stride*4;
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1812 for(x=0; x<8; x++)
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1813 {
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1814 int t1= tmp[x];
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1815 int t2= tmp2[x];
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1816 int t3= src[0];
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1817
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1818 src[stride*0]= CLIP((-(t1 + src[stride*2]) + 2*(t2 + src[stride*1]) + 6*t3 + 4)>>3);
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1819 t1= src[stride*1];
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1820 src[stride*1]= CLIP((-(t2 + src[stride*3]) + 2*(t3 + src[stride*2]) + 6*t1 + 4)>>3);
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1821 t2= src[stride*2];
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1822 src[stride*2]= CLIP((-(t3 + src[stride*4]) + 2*(t1 + src[stride*3]) + 6*t2 + 4)>>3);
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1823 t3= src[stride*3];
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1824 src[stride*3]= CLIP((-(t1 + src[stride*5]) + 2*(t2 + src[stride*4]) + 6*t3 + 4)>>3);
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1825 t1= src[stride*4];
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1826 src[stride*4]= CLIP((-(t2 + src[stride*6]) + 2*(t3 + src[stride*5]) + 6*t1 + 4)>>3);
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1827 t2= src[stride*5];
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1828 src[stride*5]= CLIP((-(t3 + src[stride*7]) + 2*(t1 + src[stride*6]) + 6*t2 + 4)>>3);
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1829 t3= src[stride*6];
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1830 src[stride*6]= CLIP((-(t1 + src[stride*8]) + 2*(t2 + src[stride*7]) + 6*t3 + 4)>>3);
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1831 t1= src[stride*7];
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1832 src[stride*7]= CLIP((-(t2 + src[stride*9]) + 2*(t3 + src[stride*8]) + 6*t1 + 4)>>3);
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1833
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1834 tmp[x]= t3;
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1835 tmp2[x]= t1;
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1836
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1837 src++;
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1838 }
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1839 #endif
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1840 }
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1841
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
1842 /**
1109
3644e555a20a doxy / cosmetics
michaelni
parents: 1029
diff changeset
1843 * Deinterlaces the given block by filtering all lines with a (1 2 1) filter.
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1844 * will be called for every 8x8 block and can read & write from line 4-15
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1845 * lines 0-3 have been passed through the deblock / dering filters allready, but can be read too
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1846 * lines 4-12 will be read into the deblocking filter and should be deinterlaced
142
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
1847 * this filter will read lines 4-13 and write 4-11
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1848 */
1581
d2fc92d02bf7 linear blend 1 line shift fix
michael
parents: 1331
diff changeset
1849 static inline void RENAME(deInterlaceBlendLinear)(uint8_t src[], int stride, uint8_t *tmp)
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1850 {
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1851 #if defined (HAVE_MMX2) || defined (HAVE_3DNOW)
142
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
1852 src+= 4*stride;
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1853 asm volatile(
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1854 "lea (%0, %1), %%"REG_a" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1855 "lea (%%"REG_a", %1, 4), %%"REG_d" \n\t"
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1856 // 0 1 2 3 4 5 6 7 8 9
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1857 // %0 eax eax+%1 eax+2%1 %0+4%1 edx edx+%1 edx+2%1 %0+8%1 edx+4%1
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1858
1581
d2fc92d02bf7 linear blend 1 line shift fix
michael
parents: 1331
diff changeset
1859 "movq (%2), %%mm0 \n\t" // L0
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1860 "movq (%%"REG_a"), %%mm1 \n\t" // L2
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1861 PAVGB(%%mm1, %%mm0) // L0+L2
1581
d2fc92d02bf7 linear blend 1 line shift fix
michael
parents: 1331
diff changeset
1862 "movq (%0), %%mm2 \n\t" // L1
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1863 PAVGB(%%mm2, %%mm0)
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1864 "movq %%mm0, (%0) \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1865 "movq (%%"REG_a", %1), %%mm0 \n\t" // L3
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1866 PAVGB(%%mm0, %%mm2) // L1+L3
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1867 PAVGB(%%mm1, %%mm2) // 2L2 + L1 + L3
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1868 "movq %%mm2, (%%"REG_a") \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1869 "movq (%%"REG_a", %1, 2), %%mm2 \n\t" // L4
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1870 PAVGB(%%mm2, %%mm1) // L2+L4
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1871 PAVGB(%%mm0, %%mm1) // 2L3 + L2 + L4
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1872 "movq %%mm1, (%%"REG_a", %1) \n\t"
1581
d2fc92d02bf7 linear blend 1 line shift fix
michael
parents: 1331
diff changeset
1873 "movq (%0, %1, 4), %%mm1 \n\t" // L5
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1874 PAVGB(%%mm1, %%mm0) // L3+L5
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1875 PAVGB(%%mm2, %%mm0) // 2L4 + L3 + L5
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1876 "movq %%mm0, (%%"REG_a", %1, 2) \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1877 "movq (%%"REG_d"), %%mm0 \n\t" // L6
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1878 PAVGB(%%mm0, %%mm2) // L4+L6
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1879 PAVGB(%%mm1, %%mm2) // 2L5 + L4 + L6
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1880 "movq %%mm2, (%0, %1, 4) \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1881 "movq (%%"REG_d", %1), %%mm2 \n\t" // L7
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1882 PAVGB(%%mm2, %%mm1) // L5+L7
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1883 PAVGB(%%mm0, %%mm1) // 2L6 + L5 + L7
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1884 "movq %%mm1, (%%"REG_d") \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1885 "movq (%%"REG_d", %1, 2), %%mm1 \n\t" // L8
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1886 PAVGB(%%mm1, %%mm0) // L6+L8
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1887 PAVGB(%%mm2, %%mm0) // 2L7 + L6 + L8
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1888 "movq %%mm0, (%%"REG_d", %1) \n\t"
1581
d2fc92d02bf7 linear blend 1 line shift fix
michael
parents: 1331
diff changeset
1889 "movq (%0, %1, 8), %%mm0 \n\t" // L9
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1890 PAVGB(%%mm0, %%mm2) // L7+L9
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1891 PAVGB(%%mm1, %%mm2) // 2L8 + L7 + L9
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1892 "movq %%mm2, (%%"REG_d", %1, 2) \n\t"
1581
d2fc92d02bf7 linear blend 1 line shift fix
michael
parents: 1331
diff changeset
1893 "movq %%mm1, (%2) \n\t"
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1894
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1895 : : "r" (src), "r" ((long)stride), "r" (tmp)
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1896 : "%"REG_a, "%"REG_d
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1897 );
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1898 #else
1158
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1899 int a, b, c, x;
142
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
1900 src+= 4*stride;
1158
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1901
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1902 for(x=0; x<2; x++){
1581
d2fc92d02bf7 linear blend 1 line shift fix
michael
parents: 1331
diff changeset
1903 a= *(uint32_t*)&tmp[stride*0];
d2fc92d02bf7 linear blend 1 line shift fix
michael
parents: 1331
diff changeset
1904 b= *(uint32_t*)&src[stride*0];
d2fc92d02bf7 linear blend 1 line shift fix
michael
parents: 1331
diff changeset
1905 c= *(uint32_t*)&src[stride*1];
1158
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1906 a= (a&c) + (((a^c)&0xFEFEFEFEUL)>>1);
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1907 *(uint32_t*)&src[stride*0]= (a|b) - (((a^b)&0xFEFEFEFEUL)>>1);
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1908
1581
d2fc92d02bf7 linear blend 1 line shift fix
michael
parents: 1331
diff changeset
1909 a= *(uint32_t*)&src[stride*2];
1158
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1910 b= (a&b) + (((a^b)&0xFEFEFEFEUL)>>1);
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1911 *(uint32_t*)&src[stride*1]= (c|b) - (((c^b)&0xFEFEFEFEUL)>>1);
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1912
1581
d2fc92d02bf7 linear blend 1 line shift fix
michael
parents: 1331
diff changeset
1913 b= *(uint32_t*)&src[stride*3];
1158
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1914 c= (b&c) + (((b^c)&0xFEFEFEFEUL)>>1);
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1915 *(uint32_t*)&src[stride*2]= (c|a) - (((c^a)&0xFEFEFEFEUL)>>1);
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1916
1581
d2fc92d02bf7 linear blend 1 line shift fix
michael
parents: 1331
diff changeset
1917 c= *(uint32_t*)&src[stride*4];
1158
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1918 a= (a&c) + (((a^c)&0xFEFEFEFEUL)>>1);
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1919 *(uint32_t*)&src[stride*3]= (a|b) - (((a^b)&0xFEFEFEFEUL)>>1);
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1920
1581
d2fc92d02bf7 linear blend 1 line shift fix
michael
parents: 1331
diff changeset
1921 a= *(uint32_t*)&src[stride*5];
1158
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1922 b= (a&b) + (((a^b)&0xFEFEFEFEUL)>>1);
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1923 *(uint32_t*)&src[stride*4]= (c|b) - (((c^b)&0xFEFEFEFEUL)>>1);
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1924
1581
d2fc92d02bf7 linear blend 1 line shift fix
michael
parents: 1331
diff changeset
1925 b= *(uint32_t*)&src[stride*6];
1158
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1926 c= (b&c) + (((b^c)&0xFEFEFEFEUL)>>1);
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1927 *(uint32_t*)&src[stride*5]= (c|a) - (((c^a)&0xFEFEFEFEUL)>>1);
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1928
1581
d2fc92d02bf7 linear blend 1 line shift fix
michael
parents: 1331
diff changeset
1929 c= *(uint32_t*)&src[stride*7];
1158
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1930 a= (a&c) + (((a^c)&0xFEFEFEFEUL)>>1);
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1931 *(uint32_t*)&src[stride*6]= (a|b) - (((a^b)&0xFEFEFEFEUL)>>1);
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1932
1581
d2fc92d02bf7 linear blend 1 line shift fix
michael
parents: 1331
diff changeset
1933 a= *(uint32_t*)&src[stride*8];
1158
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1934 b= (a&b) + (((a^b)&0xFEFEFEFEUL)>>1);
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1935 *(uint32_t*)&src[stride*7]= (c|b) - (((c^b)&0xFEFEFEFEUL)>>1);
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1936
1581
d2fc92d02bf7 linear blend 1 line shift fix
michael
parents: 1331
diff changeset
1937 *(uint32_t*)&tmp[stride*0]= c;
1158
71d890b5c13b faster C linear blend & interpolate deinterlacers
michaelni
parents: 1157
diff changeset
1938 src += 4;
1581
d2fc92d02bf7 linear blend 1 line shift fix
michael
parents: 1331
diff changeset
1939 tmp += 4;
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1940 }
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1941 #endif
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1942 }
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1943
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1944 /**
1109
3644e555a20a doxy / cosmetics
michaelni
parents: 1029
diff changeset
1945 * Deinterlaces the given block by applying a median filter to every second line.
142
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
1946 * will be called for every 8x8 block and can read & write from line 4-15,
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
1947 * lines 0-3 have been passed through the deblock / dering filters allready, but can be read too
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
1948 * lines 4-12 will be read into the deblocking filter and should be deinterlaced
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1949 */
169
20bcd5b70886 runtime cpu detection
michael
parents: 168
diff changeset
1950 static inline void RENAME(deInterlaceMedian)(uint8_t src[], int stride)
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1951 {
107
bd163e13a0fb minor cleanups
michael
parents: 106
diff changeset
1952 #ifdef HAVE_MMX
142
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
1953 src+= 4*stride;
107
bd163e13a0fb minor cleanups
michael
parents: 106
diff changeset
1954 #ifdef HAVE_MMX2
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1955 asm volatile(
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1956 "lea (%0, %1), %%"REG_a" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1957 "lea (%%"REG_a", %1, 4), %%"REG_d" \n\t"
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1958 // 0 1 2 3 4 5 6 7 8 9
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
1959 // %0 eax eax+%1 eax+2%1 %0+4%1 edx edx+%1 edx+2%1 %0+8%1 edx+4%1
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1960
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1961 "movq (%0), %%mm0 \n\t" //
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1962 "movq (%%"REG_a", %1), %%mm2 \n\t" //
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1963 "movq (%%"REG_a"), %%mm1 \n\t" //
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1964 "movq %%mm0, %%mm3 \n\t"
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1965 "pmaxub %%mm1, %%mm0 \n\t" //
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1966 "pminub %%mm3, %%mm1 \n\t" //
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1967 "pmaxub %%mm2, %%mm1 \n\t" //
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1968 "pminub %%mm1, %%mm0 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1969 "movq %%mm0, (%%"REG_a") \n\t"
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1970
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1971 "movq (%0, %1, 4), %%mm0 \n\t" //
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1972 "movq (%%"REG_a", %1, 2), %%mm1 \n\t" //
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1973 "movq %%mm2, %%mm3 \n\t"
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1974 "pmaxub %%mm1, %%mm2 \n\t" //
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1975 "pminub %%mm3, %%mm1 \n\t" //
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1976 "pmaxub %%mm0, %%mm1 \n\t" //
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1977 "pminub %%mm1, %%mm2 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1978 "movq %%mm2, (%%"REG_a", %1, 2) \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1979
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1980 "movq (%%"REG_d"), %%mm2 \n\t" //
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1981 "movq (%%"REG_d", %1), %%mm1 \n\t" //
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1982 "movq %%mm2, %%mm3 \n\t"
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1983 "pmaxub %%mm0, %%mm2 \n\t" //
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1984 "pminub %%mm3, %%mm0 \n\t" //
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1985 "pmaxub %%mm1, %%mm0 \n\t" //
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1986 "pminub %%mm0, %%mm2 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1987 "movq %%mm2, (%%"REG_d") \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1988
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1989 "movq (%%"REG_d", %1, 2), %%mm2 \n\t" //
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1990 "movq (%0, %1, 8), %%mm0 \n\t" //
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1991 "movq %%mm2, %%mm3 \n\t"
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1992 "pmaxub %%mm0, %%mm2 \n\t" //
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1993 "pminub %%mm3, %%mm0 \n\t" //
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1994 "pmaxub %%mm1, %%mm0 \n\t" //
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
1995 "pminub %%mm0, %%mm2 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1996 "movq %%mm2, (%%"REG_d", %1, 2) \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1997
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1998
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
1999 : : "r" (src), "r" ((long)stride)
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2000 : "%"REG_a, "%"REG_d
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
2001 );
107
bd163e13a0fb minor cleanups
michael
parents: 106
diff changeset
2002
bd163e13a0fb minor cleanups
michael
parents: 106
diff changeset
2003 #else // MMX without MMX2
bd163e13a0fb minor cleanups
michael
parents: 106
diff changeset
2004 asm volatile(
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2005 "lea (%0, %1), %%"REG_a" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2006 "lea (%%"REG_a", %1, 4), %%"REG_d" \n\t"
107
bd163e13a0fb minor cleanups
michael
parents: 106
diff changeset
2007 // 0 1 2 3 4 5 6 7 8 9
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
2008 // %0 eax eax+%1 eax+2%1 %0+4%1 edx edx+%1 edx+2%1 %0+8%1 edx+4%1
107
bd163e13a0fb minor cleanups
michael
parents: 106
diff changeset
2009 "pxor %%mm7, %%mm7 \n\t"
bd163e13a0fb minor cleanups
michael
parents: 106
diff changeset
2010
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2011 #define REAL_MEDIAN(a,b,c)\
107
bd163e13a0fb minor cleanups
michael
parents: 106
diff changeset
2012 "movq " #a ", %%mm0 \n\t"\
bd163e13a0fb minor cleanups
michael
parents: 106
diff changeset
2013 "movq " #b ", %%mm2 \n\t"\
bd163e13a0fb minor cleanups
michael
parents: 106
diff changeset
2014 "movq " #c ", %%mm1 \n\t"\
bd163e13a0fb minor cleanups
michael
parents: 106
diff changeset
2015 "movq %%mm0, %%mm3 \n\t"\
bd163e13a0fb minor cleanups
michael
parents: 106
diff changeset
2016 "movq %%mm1, %%mm4 \n\t"\
bd163e13a0fb minor cleanups
michael
parents: 106
diff changeset
2017 "movq %%mm2, %%mm5 \n\t"\
bd163e13a0fb minor cleanups
michael
parents: 106
diff changeset
2018 "psubusb %%mm1, %%mm3 \n\t"\
bd163e13a0fb minor cleanups
michael
parents: 106
diff changeset
2019 "psubusb %%mm2, %%mm4 \n\t"\
bd163e13a0fb minor cleanups
michael
parents: 106
diff changeset
2020 "psubusb %%mm0, %%mm5 \n\t"\
bd163e13a0fb minor cleanups
michael
parents: 106
diff changeset
2021 "pcmpeqb %%mm7, %%mm3 \n\t"\
bd163e13a0fb minor cleanups
michael
parents: 106
diff changeset
2022 "pcmpeqb %%mm7, %%mm4 \n\t"\
bd163e13a0fb minor cleanups
michael
parents: 106
diff changeset
2023 "pcmpeqb %%mm7, %%mm5 \n\t"\
bd163e13a0fb minor cleanups
michael
parents: 106
diff changeset
2024 "movq %%mm3, %%mm6 \n\t"\
bd163e13a0fb minor cleanups
michael
parents: 106
diff changeset
2025 "pxor %%mm4, %%mm3 \n\t"\
bd163e13a0fb minor cleanups
michael
parents: 106
diff changeset
2026 "pxor %%mm5, %%mm4 \n\t"\
bd163e13a0fb minor cleanups
michael
parents: 106
diff changeset
2027 "pxor %%mm6, %%mm5 \n\t"\
bd163e13a0fb minor cleanups
michael
parents: 106
diff changeset
2028 "por %%mm3, %%mm1 \n\t"\
bd163e13a0fb minor cleanups
michael
parents: 106
diff changeset
2029 "por %%mm4, %%mm2 \n\t"\
bd163e13a0fb minor cleanups
michael
parents: 106
diff changeset
2030 "por %%mm5, %%mm0 \n\t"\
bd163e13a0fb minor cleanups
michael
parents: 106
diff changeset
2031 "pand %%mm2, %%mm0 \n\t"\
bd163e13a0fb minor cleanups
michael
parents: 106
diff changeset
2032 "pand %%mm1, %%mm0 \n\t"\
bd163e13a0fb minor cleanups
michael
parents: 106
diff changeset
2033 "movq %%mm0, " #b " \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2034 #define MEDIAN(a,b,c) REAL_MEDIAN(a,b,c)
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2035
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2036 MEDIAN((%0), (%%REGa), (%%REGa, %1))
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2037 MEDIAN((%%REGa, %1), (%%REGa, %1, 2), (%0, %1, 4))
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2038 MEDIAN((%0, %1, 4), (%%REGd), (%%REGd, %1))
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2039 MEDIAN((%%REGd, %1), (%%REGd, %1, 2), (%0, %1, 8))
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2040
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2041 : : "r" (src), "r" ((long)stride)
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2042 : "%"REG_a, "%"REG_d
107
bd163e13a0fb minor cleanups
michael
parents: 106
diff changeset
2043 );
bd163e13a0fb minor cleanups
michael
parents: 106
diff changeset
2044 #endif // MMX
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
2045 #else
1029
804cc05a3f61 C implementation of the median deinterlacer (seems to be the only one
rfelker
parents: 957
diff changeset
2046 int x, y;
142
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
2047 src+= 4*stride;
1029
804cc05a3f61 C implementation of the median deinterlacer (seems to be the only one
rfelker
parents: 957
diff changeset
2048 // FIXME - there should be a way to do a few columns in parallel like w/mmx
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
2049 for(x=0; x<8; x++)
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
2050 {
1029
804cc05a3f61 C implementation of the median deinterlacer (seems to be the only one
rfelker
parents: 957
diff changeset
2051 uint8_t *colsrc = src;
804cc05a3f61 C implementation of the median deinterlacer (seems to be the only one
rfelker
parents: 957
diff changeset
2052 for (y=0; y<4; y++)
804cc05a3f61 C implementation of the median deinterlacer (seems to be the only one
rfelker
parents: 957
diff changeset
2053 {
804cc05a3f61 C implementation of the median deinterlacer (seems to be the only one
rfelker
parents: 957
diff changeset
2054 int a, b, c, d, e, f;
804cc05a3f61 C implementation of the median deinterlacer (seems to be the only one
rfelker
parents: 957
diff changeset
2055 a = colsrc[0 ];
804cc05a3f61 C implementation of the median deinterlacer (seems to be the only one
rfelker
parents: 957
diff changeset
2056 b = colsrc[stride ];
804cc05a3f61 C implementation of the median deinterlacer (seems to be the only one
rfelker
parents: 957
diff changeset
2057 c = colsrc[stride*2];
804cc05a3f61 C implementation of the median deinterlacer (seems to be the only one
rfelker
parents: 957
diff changeset
2058 d = (a-b)>>31;
804cc05a3f61 C implementation of the median deinterlacer (seems to be the only one
rfelker
parents: 957
diff changeset
2059 e = (b-c)>>31;
804cc05a3f61 C implementation of the median deinterlacer (seems to be the only one
rfelker
parents: 957
diff changeset
2060 f = (c-a)>>31;
804cc05a3f61 C implementation of the median deinterlacer (seems to be the only one
rfelker
parents: 957
diff changeset
2061 colsrc[stride ] = (a|(d^f)) & (b|(d^e)) & (c|(e^f));
804cc05a3f61 C implementation of the median deinterlacer (seems to be the only one
rfelker
parents: 957
diff changeset
2062 colsrc += stride*2;
804cc05a3f61 C implementation of the median deinterlacer (seems to be the only one
rfelker
parents: 957
diff changeset
2063 }
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
2064 src++;
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
2065 }
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
2066 #endif
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
2067 }
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
2068
129
be35346e27c1 fixed difference with -vo md5 between doVertDefFilter() C and MMX / MMX2 versions
michael
parents: 128
diff changeset
2069 #ifdef HAVE_MMX
128
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2070 /**
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2071 * transposes and shift the given 8x8 Block into dst1 and dst2
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2072 */
169
20bcd5b70886 runtime cpu detection
michael
parents: 168
diff changeset
2073 static inline void RENAME(transpose1)(uint8_t *dst1, uint8_t *dst2, uint8_t *src, int srcStride)
128
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2074 {
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2075 asm(
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2076 "lea (%0, %1), %%"REG_a" \n\t"
128
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2077 // 0 1 2 3 4 5 6 7 8 9
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
2078 // %0 eax eax+%1 eax+2%1 %0+4%1 edx edx+%1 edx+2%1 %0+8%1 edx+4%1
128
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2079 "movq (%0), %%mm0 \n\t" // 12345678
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2080 "movq (%%"REG_a"), %%mm1 \n\t" // abcdefgh
128
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2081 "movq %%mm0, %%mm2 \n\t" // 12345678
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2082 "punpcklbw %%mm1, %%mm0 \n\t" // 1a2b3c4d
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2083 "punpckhbw %%mm1, %%mm2 \n\t" // 5e6f7g8h
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2084
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2085 "movq (%%"REG_a", %1), %%mm1 \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2086 "movq (%%"REG_a", %1, 2), %%mm3 \n\t"
128
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2087 "movq %%mm1, %%mm4 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2088 "punpcklbw %%mm3, %%mm1 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2089 "punpckhbw %%mm3, %%mm4 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2090
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2091 "movq %%mm0, %%mm3 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2092 "punpcklwd %%mm1, %%mm0 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2093 "punpckhwd %%mm1, %%mm3 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2094 "movq %%mm2, %%mm1 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2095 "punpcklwd %%mm4, %%mm2 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2096 "punpckhwd %%mm4, %%mm1 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2097
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2098 "movd %%mm0, 128(%2) \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2099 "psrlq $32, %%mm0 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2100 "movd %%mm0, 144(%2) \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2101 "movd %%mm3, 160(%2) \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2102 "psrlq $32, %%mm3 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2103 "movd %%mm3, 176(%2) \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2104 "movd %%mm3, 48(%3) \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2105 "movd %%mm2, 192(%2) \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2106 "movd %%mm2, 64(%3) \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2107 "psrlq $32, %%mm2 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2108 "movd %%mm2, 80(%3) \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2109 "movd %%mm1, 96(%3) \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2110 "psrlq $32, %%mm1 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2111 "movd %%mm1, 112(%3) \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2112
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2113 "lea (%%"REG_a", %1, 4), %%"REG_a" \n\t"
789
54079a650ba8 using fewer registers (fixes compilation bug hopefully)
michael
parents: 788
diff changeset
2114
128
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2115 "movq (%0, %1, 4), %%mm0 \n\t" // 12345678
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2116 "movq (%%"REG_a"), %%mm1 \n\t" // abcdefgh
128
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2117 "movq %%mm0, %%mm2 \n\t" // 12345678
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2118 "punpcklbw %%mm1, %%mm0 \n\t" // 1a2b3c4d
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2119 "punpckhbw %%mm1, %%mm2 \n\t" // 5e6f7g8h
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2120
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2121 "movq (%%"REG_a", %1), %%mm1 \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2122 "movq (%%"REG_a", %1, 2), %%mm3 \n\t"
128
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2123 "movq %%mm1, %%mm4 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2124 "punpcklbw %%mm3, %%mm1 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2125 "punpckhbw %%mm3, %%mm4 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2126
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2127 "movq %%mm0, %%mm3 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2128 "punpcklwd %%mm1, %%mm0 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2129 "punpckhwd %%mm1, %%mm3 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2130 "movq %%mm2, %%mm1 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2131 "punpcklwd %%mm4, %%mm2 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2132 "punpckhwd %%mm4, %%mm1 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2133
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2134 "movd %%mm0, 132(%2) \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2135 "psrlq $32, %%mm0 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2136 "movd %%mm0, 148(%2) \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2137 "movd %%mm3, 164(%2) \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2138 "psrlq $32, %%mm3 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2139 "movd %%mm3, 180(%2) \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2140 "movd %%mm3, 52(%3) \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2141 "movd %%mm2, 196(%2) \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2142 "movd %%mm2, 68(%3) \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2143 "psrlq $32, %%mm2 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2144 "movd %%mm2, 84(%3) \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2145 "movd %%mm1, 100(%3) \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2146 "psrlq $32, %%mm1 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2147 "movd %%mm1, 116(%3) \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2148
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2149
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2150 :: "r" (src), "r" ((long)srcStride), "r" (dst1), "r" (dst2)
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2151 : "%"REG_a
128
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2152 );
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2153 }
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2154
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2155 /**
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2156 * transposes the given 8x8 block
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2157 */
169
20bcd5b70886 runtime cpu detection
michael
parents: 168
diff changeset
2158 static inline void RENAME(transpose2)(uint8_t *dst, int dstStride, uint8_t *src)
128
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2159 {
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2160 asm(
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2161 "lea (%0, %1), %%"REG_a" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2162 "lea (%%"REG_a",%1,4), %%"REG_d"\n\t"
128
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2163 // 0 1 2 3 4 5 6 7 8 9
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
2164 // %0 eax eax+%1 eax+2%1 %0+4%1 edx edx+%1 edx+2%1 %0+8%1 edx+4%1
128
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2165 "movq (%2), %%mm0 \n\t" // 12345678
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2166 "movq 16(%2), %%mm1 \n\t" // abcdefgh
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2167 "movq %%mm0, %%mm2 \n\t" // 12345678
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2168 "punpcklbw %%mm1, %%mm0 \n\t" // 1a2b3c4d
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2169 "punpckhbw %%mm1, %%mm2 \n\t" // 5e6f7g8h
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2170
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2171 "movq 32(%2), %%mm1 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2172 "movq 48(%2), %%mm3 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2173 "movq %%mm1, %%mm4 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2174 "punpcklbw %%mm3, %%mm1 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2175 "punpckhbw %%mm3, %%mm4 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2176
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2177 "movq %%mm0, %%mm3 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2178 "punpcklwd %%mm1, %%mm0 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2179 "punpckhwd %%mm1, %%mm3 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2180 "movq %%mm2, %%mm1 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2181 "punpcklwd %%mm4, %%mm2 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2182 "punpckhwd %%mm4, %%mm1 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2183
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2184 "movd %%mm0, (%0) \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2185 "psrlq $32, %%mm0 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2186 "movd %%mm0, (%%"REG_a") \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2187 "movd %%mm3, (%%"REG_a", %1) \n\t"
128
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2188 "psrlq $32, %%mm3 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2189 "movd %%mm3, (%%"REG_a", %1, 2) \n\t"
128
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2190 "movd %%mm2, (%0, %1, 4) \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2191 "psrlq $32, %%mm2 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2192 "movd %%mm2, (%%"REG_d") \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2193 "movd %%mm1, (%%"REG_d", %1) \n\t"
128
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2194 "psrlq $32, %%mm1 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2195 "movd %%mm1, (%%"REG_d", %1, 2) \n\t"
128
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2196
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2197
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2198 "movq 64(%2), %%mm0 \n\t" // 12345678
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2199 "movq 80(%2), %%mm1 \n\t" // abcdefgh
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2200 "movq %%mm0, %%mm2 \n\t" // 12345678
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2201 "punpcklbw %%mm1, %%mm0 \n\t" // 1a2b3c4d
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2202 "punpckhbw %%mm1, %%mm2 \n\t" // 5e6f7g8h
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2203
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2204 "movq 96(%2), %%mm1 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2205 "movq 112(%2), %%mm3 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2206 "movq %%mm1, %%mm4 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2207 "punpcklbw %%mm3, %%mm1 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2208 "punpckhbw %%mm3, %%mm4 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2209
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2210 "movq %%mm0, %%mm3 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2211 "punpcklwd %%mm1, %%mm0 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2212 "punpckhwd %%mm1, %%mm3 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2213 "movq %%mm2, %%mm1 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2214 "punpcklwd %%mm4, %%mm2 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2215 "punpckhwd %%mm4, %%mm1 \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2216
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2217 "movd %%mm0, 4(%0) \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2218 "psrlq $32, %%mm0 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2219 "movd %%mm0, 4(%%"REG_a") \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2220 "movd %%mm3, 4(%%"REG_a", %1) \n\t"
128
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2221 "psrlq $32, %%mm3 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2222 "movd %%mm3, 4(%%"REG_a", %1, 2) \n\t"
128
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2223 "movd %%mm2, 4(%0, %1, 4) \n\t"
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2224 "psrlq $32, %%mm2 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2225 "movd %%mm2, 4(%%"REG_d") \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2226 "movd %%mm1, 4(%%"REG_d", %1) \n\t"
128
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2227 "psrlq $32, %%mm1 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2228 "movd %%mm1, 4(%%"REG_d", %1, 2) \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2229
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2230 :: "r" (dst), "r" ((long)dstStride), "r" (src)
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2231 : "%"REG_a, "%"REG_d
128
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2232 );
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2233 }
129
be35346e27c1 fixed difference with -vo md5 between doVertDefFilter() C and MMX / MMX2 versions
michael
parents: 128
diff changeset
2234 #endif
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2235 //static long test=0;
128
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
2236
2041
b996fbe0a7e7 Newer version, using a vectorized version of the
michael
parents: 2040
diff changeset
2237 #ifndef HAVE_ALTIVEC
943
0566d1a8426f 10l (int i)
michael
parents: 941
diff changeset
2238 static inline void RENAME(tempNoiseReducer)(uint8_t *src, int stride,
158
d1a4f4ca7178 temp denoiser:
michael
parents: 157
diff changeset
2239 uint8_t *tempBlured, uint32_t *tempBluredPast, int *maxNoise)
156
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2240 {
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
2241 // to save a register (FIXME do this outside of the loops)
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
2242 tempBluredPast[127]= maxNoise[0];
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
2243 tempBluredPast[128]= maxNoise[1];
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
2244 tempBluredPast[129]= maxNoise[2];
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
2245
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2246 #define FAST_L2_DIFF
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2247 //#define L1_DIFF //u should change the thresholds too if u try that one
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2248 #if defined (HAVE_MMX2) || defined (HAVE_3DNOW)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2249 asm volatile(
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2250 "lea (%2, %2, 2), %%"REG_a" \n\t" // 3*stride
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2251 "lea (%2, %2, 4), %%"REG_d" \n\t" // 5*stride
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2252 "lea (%%"REG_d", %2, 2), %%"REG_c" \n\t" // 7*stride
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2253 // 0 1 2 3 4 5 6 7 8 9
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
2254 // %x %x+%2 %x+2%2 %x+eax %x+4%2 %x+edx %x+2eax %x+ecx %x+8%2
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2255 //FIXME reorder?
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2256 #ifdef L1_DIFF //needs mmx2
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2257 "movq (%0), %%mm0 \n\t" // L0
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2258 "psadbw (%1), %%mm0 \n\t" // |L0-R0|
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2259 "movq (%0, %2), %%mm1 \n\t" // L1
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2260 "psadbw (%1, %2), %%mm1 \n\t" // |L1-R1|
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2261 "movq (%0, %2, 2), %%mm2 \n\t" // L2
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2262 "psadbw (%1, %2, 2), %%mm2 \n\t" // |L2-R2|
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2263 "movq (%0, %%"REG_a"), %%mm3 \n\t" // L3
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2264 "psadbw (%1, %%"REG_a"), %%mm3 \n\t" // |L3-R3|
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2265
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2266 "movq (%0, %2, 4), %%mm4 \n\t" // L4
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2267 "paddw %%mm1, %%mm0 \n\t"
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2268 "psadbw (%1, %2, 4), %%mm4 \n\t" // |L4-R4|
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2269 "movq (%0, %%"REG_d"), %%mm5 \n\t" // L5
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2270 "paddw %%mm2, %%mm0 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2271 "psadbw (%1, %%"REG_d"), %%mm5 \n\t" // |L5-R5|
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2272 "movq (%0, %%"REG_a", 2), %%mm6 \n\t" // L6
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2273 "paddw %%mm3, %%mm0 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2274 "psadbw (%1, %%"REG_a", 2), %%mm6 \n\t" // |L6-R6|
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2275 "movq (%0, %%"REG_c"), %%mm7 \n\t" // L7
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2276 "paddw %%mm4, %%mm0 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2277 "psadbw (%1, %%"REG_c"), %%mm7 \n\t" // |L7-R7|
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2278 "paddw %%mm5, %%mm6 \n\t"
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2279 "paddw %%mm7, %%mm6 \n\t"
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2280 "paddw %%mm6, %%mm0 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2281 #else
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2282 #if defined (FAST_L2_DIFF)
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2283 "pcmpeqb %%mm7, %%mm7 \n\t"
210
c2b6d68a0671 mangle for win32 in postproc
atmos4
parents: 182
diff changeset
2284 "movq "MANGLE(b80)", %%mm6 \n\t"
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2285 "pxor %%mm0, %%mm0 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2286 #define REAL_L2_DIFF_CORE(a, b)\
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2287 "movq " #a ", %%mm5 \n\t"\
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2288 "movq " #b ", %%mm2 \n\t"\
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2289 "pxor %%mm7, %%mm2 \n\t"\
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2290 PAVGB(%%mm2, %%mm5)\
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2291 "paddb %%mm6, %%mm5 \n\t"\
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2292 "movq %%mm5, %%mm2 \n\t"\
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2293 "psllw $8, %%mm5 \n\t"\
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2294 "pmaddwd %%mm5, %%mm5 \n\t"\
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2295 "pmaddwd %%mm2, %%mm2 \n\t"\
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2296 "paddd %%mm2, %%mm5 \n\t"\
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2297 "psrld $14, %%mm5 \n\t"\
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2298 "paddd %%mm5, %%mm0 \n\t"
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2299
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2300 #else
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2301 "pxor %%mm7, %%mm7 \n\t"
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2302 "pxor %%mm0, %%mm0 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2303 #define REAL_L2_DIFF_CORE(a, b)\
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2304 "movq " #a ", %%mm5 \n\t"\
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2305 "movq " #b ", %%mm2 \n\t"\
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2306 "movq %%mm5, %%mm1 \n\t"\
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2307 "movq %%mm2, %%mm3 \n\t"\
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2308 "punpcklbw %%mm7, %%mm5 \n\t"\
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2309 "punpckhbw %%mm7, %%mm1 \n\t"\
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2310 "punpcklbw %%mm7, %%mm2 \n\t"\
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2311 "punpckhbw %%mm7, %%mm3 \n\t"\
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2312 "psubw %%mm2, %%mm5 \n\t"\
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2313 "psubw %%mm3, %%mm1 \n\t"\
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2314 "pmaddwd %%mm5, %%mm5 \n\t"\
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2315 "pmaddwd %%mm1, %%mm1 \n\t"\
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2316 "paddd %%mm1, %%mm5 \n\t"\
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2317 "paddd %%mm5, %%mm0 \n\t"
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2318
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2319 #endif
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2320
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2321 #define L2_DIFF_CORE(a, b) REAL_L2_DIFF_CORE(a, b)
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2322
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2323 L2_DIFF_CORE((%0), (%1))
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2324 L2_DIFF_CORE((%0, %2), (%1, %2))
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2325 L2_DIFF_CORE((%0, %2, 2), (%1, %2, 2))
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2326 L2_DIFF_CORE((%0, %%REGa), (%1, %%REGa))
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2327 L2_DIFF_CORE((%0, %2, 4), (%1, %2, 4))
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2328 L2_DIFF_CORE((%0, %%REGd), (%1, %%REGd))
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2329 L2_DIFF_CORE((%0, %%REGa,2), (%1, %%REGa,2))
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2330 L2_DIFF_CORE((%0, %%REGc), (%1, %%REGc))
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2331
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2332 #endif
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2333
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2334 "movq %%mm0, %%mm4 \n\t"
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2335 "psrlq $32, %%mm0 \n\t"
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2336 "paddd %%mm0, %%mm4 \n\t"
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2337 "movd %%mm4, %%ecx \n\t"
158
d1a4f4ca7178 temp denoiser:
michael
parents: 157
diff changeset
2338 "shll $2, %%ecx \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2339 "mov %3, %%"REG_d" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2340 "addl -4(%%"REG_d"), %%ecx \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2341 "addl 4(%%"REG_d"), %%ecx \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2342 "addl -1024(%%"REG_d"), %%ecx \n\t"
158
d1a4f4ca7178 temp denoiser:
michael
parents: 157
diff changeset
2343 "addl $4, %%ecx \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2344 "addl 1024(%%"REG_d"), %%ecx \n\t"
158
d1a4f4ca7178 temp denoiser:
michael
parents: 157
diff changeset
2345 "shrl $3, %%ecx \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2346 "movl %%ecx, (%%"REG_d") \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2347
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2348 // "mov %3, %%"REG_c" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2349 // "mov %%"REG_c", test \n\t"
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2350 // "jmp 4f \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2351 "cmpl 512(%%"REG_d"), %%ecx \n\t"
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2352 " jb 2f \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2353 "cmpl 516(%%"REG_d"), %%ecx \n\t"
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2354 " jb 1f \n\t"
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2355
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2356 "lea (%%"REG_a", %2, 2), %%"REG_d" \n\t" // 5*stride
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2357 "lea (%%"REG_d", %2, 2), %%"REG_c" \n\t" // 7*stride
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2358 "movq (%0), %%mm0 \n\t" // L0
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2359 "movq (%0, %2), %%mm1 \n\t" // L1
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2360 "movq (%0, %2, 2), %%mm2 \n\t" // L2
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2361 "movq (%0, %%"REG_a"), %%mm3 \n\t" // L3
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2362 "movq (%0, %2, 4), %%mm4 \n\t" // L4
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2363 "movq (%0, %%"REG_d"), %%mm5 \n\t" // L5
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2364 "movq (%0, %%"REG_a", 2), %%mm6 \n\t" // L6
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2365 "movq (%0, %%"REG_c"), %%mm7 \n\t" // L7
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2366 "movq %%mm0, (%1) \n\t" // L0
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2367 "movq %%mm1, (%1, %2) \n\t" // L1
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2368 "movq %%mm2, (%1, %2, 2) \n\t" // L2
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2369 "movq %%mm3, (%1, %%"REG_a") \n\t" // L3
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2370 "movq %%mm4, (%1, %2, 4) \n\t" // L4
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2371 "movq %%mm5, (%1, %%"REG_d") \n\t" // L5
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2372 "movq %%mm6, (%1, %%"REG_a", 2) \n\t" // L6
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2373 "movq %%mm7, (%1, %%"REG_c") \n\t" // L7
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2374 "jmp 4f \n\t"
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2375
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2376 "1: \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2377 "lea (%%"REG_a", %2, 2), %%"REG_d" \n\t" // 5*stride
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2378 "lea (%%"REG_d", %2, 2), %%"REG_c" \n\t" // 7*stride
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2379 "movq (%0), %%mm0 \n\t" // L0
363
ff766a367974 3dnow temporal denoiser bugfix by R?mi Guyomarch <rguyom@pobox.com>
michael
parents: 334
diff changeset
2380 PAVGB((%1), %%mm0) // L0
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2381 "movq (%0, %2), %%mm1 \n\t" // L1
363
ff766a367974 3dnow temporal denoiser bugfix by R?mi Guyomarch <rguyom@pobox.com>
michael
parents: 334
diff changeset
2382 PAVGB((%1, %2), %%mm1) // L1
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2383 "movq (%0, %2, 2), %%mm2 \n\t" // L2
363
ff766a367974 3dnow temporal denoiser bugfix by R?mi Guyomarch <rguyom@pobox.com>
michael
parents: 334
diff changeset
2384 PAVGB((%1, %2, 2), %%mm2) // L2
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2385 "movq (%0, %%"REG_a"), %%mm3 \n\t" // L3
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2386 PAVGB((%1, %%REGa), %%mm3) // L3
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2387 "movq (%0, %2, 4), %%mm4 \n\t" // L4
363
ff766a367974 3dnow temporal denoiser bugfix by R?mi Guyomarch <rguyom@pobox.com>
michael
parents: 334
diff changeset
2388 PAVGB((%1, %2, 4), %%mm4) // L4
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2389 "movq (%0, %%"REG_d"), %%mm5 \n\t" // L5
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2390 PAVGB((%1, %%REGd), %%mm5) // L5
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2391 "movq (%0, %%"REG_a", 2), %%mm6 \n\t" // L6
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2392 PAVGB((%1, %%REGa, 2), %%mm6) // L6
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2393 "movq (%0, %%"REG_c"), %%mm7 \n\t" // L7
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2394 PAVGB((%1, %%REGc), %%mm7) // L7
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2395 "movq %%mm0, (%1) \n\t" // R0
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2396 "movq %%mm1, (%1, %2) \n\t" // R1
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2397 "movq %%mm2, (%1, %2, 2) \n\t" // R2
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2398 "movq %%mm3, (%1, %%"REG_a") \n\t" // R3
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2399 "movq %%mm4, (%1, %2, 4) \n\t" // R4
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2400 "movq %%mm5, (%1, %%"REG_d") \n\t" // R5
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2401 "movq %%mm6, (%1, %%"REG_a", 2) \n\t" // R6
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2402 "movq %%mm7, (%1, %%"REG_c") \n\t" // R7
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2403 "movq %%mm0, (%0) \n\t" // L0
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2404 "movq %%mm1, (%0, %2) \n\t" // L1
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2405 "movq %%mm2, (%0, %2, 2) \n\t" // L2
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2406 "movq %%mm3, (%0, %%"REG_a") \n\t" // L3
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2407 "movq %%mm4, (%0, %2, 4) \n\t" // L4
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2408 "movq %%mm5, (%0, %%"REG_d") \n\t" // L5
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2409 "movq %%mm6, (%0, %%"REG_a", 2) \n\t" // L6
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2410 "movq %%mm7, (%0, %%"REG_c") \n\t" // L7
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2411 "jmp 4f \n\t"
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2412
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2413 "2: \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2414 "cmpl 508(%%"REG_d"), %%ecx \n\t"
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2415 " jb 3f \n\t"
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2416
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2417 "lea (%%"REG_a", %2, 2), %%"REG_d" \n\t" // 5*stride
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2418 "lea (%%"REG_d", %2, 2), %%"REG_c" \n\t" // 7*stride
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2419 "movq (%0), %%mm0 \n\t" // L0
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2420 "movq (%0, %2), %%mm1 \n\t" // L1
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2421 "movq (%0, %2, 2), %%mm2 \n\t" // L2
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2422 "movq (%0, %%"REG_a"), %%mm3 \n\t" // L3
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2423 "movq (%1), %%mm4 \n\t" // R0
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2424 "movq (%1, %2), %%mm5 \n\t" // R1
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2425 "movq (%1, %2, 2), %%mm6 \n\t" // R2
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2426 "movq (%1, %%"REG_a"), %%mm7 \n\t" // R3
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2427 PAVGB(%%mm4, %%mm0)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2428 PAVGB(%%mm5, %%mm1)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2429 PAVGB(%%mm6, %%mm2)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2430 PAVGB(%%mm7, %%mm3)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2431 PAVGB(%%mm4, %%mm0)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2432 PAVGB(%%mm5, %%mm1)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2433 PAVGB(%%mm6, %%mm2)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2434 PAVGB(%%mm7, %%mm3)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2435 "movq %%mm0, (%1) \n\t" // R0
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2436 "movq %%mm1, (%1, %2) \n\t" // R1
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2437 "movq %%mm2, (%1, %2, 2) \n\t" // R2
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2438 "movq %%mm3, (%1, %%"REG_a") \n\t" // R3
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2439 "movq %%mm0, (%0) \n\t" // L0
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2440 "movq %%mm1, (%0, %2) \n\t" // L1
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2441 "movq %%mm2, (%0, %2, 2) \n\t" // L2
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2442 "movq %%mm3, (%0, %%"REG_a") \n\t" // L3
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2443
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2444 "movq (%0, %2, 4), %%mm0 \n\t" // L4
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2445 "movq (%0, %%"REG_d"), %%mm1 \n\t" // L5
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2446 "movq (%0, %%"REG_a", 2), %%mm2 \n\t" // L6
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2447 "movq (%0, %%"REG_c"), %%mm3 \n\t" // L7
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2448 "movq (%1, %2, 4), %%mm4 \n\t" // R4
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2449 "movq (%1, %%"REG_d"), %%mm5 \n\t" // R5
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2450 "movq (%1, %%"REG_a", 2), %%mm6 \n\t" // R6
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2451 "movq (%1, %%"REG_c"), %%mm7 \n\t" // R7
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2452 PAVGB(%%mm4, %%mm0)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2453 PAVGB(%%mm5, %%mm1)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2454 PAVGB(%%mm6, %%mm2)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2455 PAVGB(%%mm7, %%mm3)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2456 PAVGB(%%mm4, %%mm0)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2457 PAVGB(%%mm5, %%mm1)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2458 PAVGB(%%mm6, %%mm2)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2459 PAVGB(%%mm7, %%mm3)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2460 "movq %%mm0, (%1, %2, 4) \n\t" // R4
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2461 "movq %%mm1, (%1, %%"REG_d") \n\t" // R5
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2462 "movq %%mm2, (%1, %%"REG_a", 2) \n\t" // R6
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2463 "movq %%mm3, (%1, %%"REG_c") \n\t" // R7
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2464 "movq %%mm0, (%0, %2, 4) \n\t" // L4
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2465 "movq %%mm1, (%0, %%"REG_d") \n\t" // L5
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2466 "movq %%mm2, (%0, %%"REG_a", 2) \n\t" // L6
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2467 "movq %%mm3, (%0, %%"REG_c") \n\t" // L7
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2468 "jmp 4f \n\t"
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2469
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2470 "3: \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2471 "lea (%%"REG_a", %2, 2), %%"REG_d" \n\t" // 5*stride
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2472 "lea (%%"REG_d", %2, 2), %%"REG_c" \n\t" // 7*stride
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2473 "movq (%0), %%mm0 \n\t" // L0
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2474 "movq (%0, %2), %%mm1 \n\t" // L1
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2475 "movq (%0, %2, 2), %%mm2 \n\t" // L2
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2476 "movq (%0, %%"REG_a"), %%mm3 \n\t" // L3
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2477 "movq (%1), %%mm4 \n\t" // R0
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2478 "movq (%1, %2), %%mm5 \n\t" // R1
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2479 "movq (%1, %2, 2), %%mm6 \n\t" // R2
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2480 "movq (%1, %%"REG_a"), %%mm7 \n\t" // R3
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2481 PAVGB(%%mm4, %%mm0)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2482 PAVGB(%%mm5, %%mm1)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2483 PAVGB(%%mm6, %%mm2)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2484 PAVGB(%%mm7, %%mm3)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2485 PAVGB(%%mm4, %%mm0)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2486 PAVGB(%%mm5, %%mm1)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2487 PAVGB(%%mm6, %%mm2)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2488 PAVGB(%%mm7, %%mm3)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2489 PAVGB(%%mm4, %%mm0)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2490 PAVGB(%%mm5, %%mm1)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2491 PAVGB(%%mm6, %%mm2)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2492 PAVGB(%%mm7, %%mm3)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2493 "movq %%mm0, (%1) \n\t" // R0
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2494 "movq %%mm1, (%1, %2) \n\t" // R1
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2495 "movq %%mm2, (%1, %2, 2) \n\t" // R2
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2496 "movq %%mm3, (%1, %%"REG_a") \n\t" // R3
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2497 "movq %%mm0, (%0) \n\t" // L0
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2498 "movq %%mm1, (%0, %2) \n\t" // L1
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2499 "movq %%mm2, (%0, %2, 2) \n\t" // L2
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2500 "movq %%mm3, (%0, %%"REG_a") \n\t" // L3
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2501
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2502 "movq (%0, %2, 4), %%mm0 \n\t" // L4
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2503 "movq (%0, %%"REG_d"), %%mm1 \n\t" // L5
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2504 "movq (%0, %%"REG_a", 2), %%mm2 \n\t" // L6
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2505 "movq (%0, %%"REG_c"), %%mm3 \n\t" // L7
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2506 "movq (%1, %2, 4), %%mm4 \n\t" // R4
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2507 "movq (%1, %%"REG_d"), %%mm5 \n\t" // R5
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2508 "movq (%1, %%"REG_a", 2), %%mm6 \n\t" // R6
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2509 "movq (%1, %%"REG_c"), %%mm7 \n\t" // R7
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2510 PAVGB(%%mm4, %%mm0)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2511 PAVGB(%%mm5, %%mm1)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2512 PAVGB(%%mm6, %%mm2)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2513 PAVGB(%%mm7, %%mm3)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2514 PAVGB(%%mm4, %%mm0)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2515 PAVGB(%%mm5, %%mm1)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2516 PAVGB(%%mm6, %%mm2)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2517 PAVGB(%%mm7, %%mm3)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2518 PAVGB(%%mm4, %%mm0)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2519 PAVGB(%%mm5, %%mm1)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2520 PAVGB(%%mm6, %%mm2)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2521 PAVGB(%%mm7, %%mm3)
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2522 "movq %%mm0, (%1, %2, 4) \n\t" // R4
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2523 "movq %%mm1, (%1, %%"REG_d") \n\t" // R5
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2524 "movq %%mm2, (%1, %%"REG_a", 2) \n\t" // R6
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2525 "movq %%mm3, (%1, %%"REG_c") \n\t" // R7
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2526 "movq %%mm0, (%0, %2, 4) \n\t" // L4
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2527 "movq %%mm1, (%0, %%"REG_d") \n\t" // L5
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2528 "movq %%mm2, (%0, %%"REG_a", 2) \n\t" // L6
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2529 "movq %%mm3, (%0, %%"REG_c") \n\t" // L7
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2530
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2531 "4: \n\t"
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2532
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2533 :: "r" (src), "r" (tempBlured), "r"((long)stride), "m" (tempBluredPast)
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2534 : "%"REG_a, "%"REG_d, "%"REG_c, "memory"
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2535 );
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2536 //printf("%d\n", test);
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2537 #else
788
425d71e81c37 fix compilation on non-x86 with gcc 2.95
colin
parents: 787
diff changeset
2538 {
156
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2539 int y;
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2540 int d=0;
2041
b996fbe0a7e7 Newer version, using a vectorized version of the
michael
parents: 2040
diff changeset
2541 // int sysd=0;
158
d1a4f4ca7178 temp denoiser:
michael
parents: 157
diff changeset
2542 int i;
156
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2543
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2544 for(y=0; y<8; y++)
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2545 {
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2546 int x;
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2547 for(x=0; x<8; x++)
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2548 {
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2549 int ref= tempBlured[ x + y*stride ];
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2550 int cur= src[ x + y*stride ];
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2551 int d1=ref - cur;
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2552 // if(x==0 || x==7) d1+= d1>>1;
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2553 // if(y==0 || y==7) d1+= d1>>1;
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2554 // d+= ABS(d1);
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2555 d+= d1*d1;
2041
b996fbe0a7e7 Newer version, using a vectorized version of the
michael
parents: 2040
diff changeset
2556 // sysd+= d1;
156
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2557 }
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2558 }
158
d1a4f4ca7178 temp denoiser:
michael
parents: 157
diff changeset
2559 i=d;
d1a4f4ca7178 temp denoiser:
michael
parents: 157
diff changeset
2560 d= (
d1a4f4ca7178 temp denoiser:
michael
parents: 157
diff changeset
2561 4*d
d1a4f4ca7178 temp denoiser:
michael
parents: 157
diff changeset
2562 +(*(tempBluredPast-256))
d1a4f4ca7178 temp denoiser:
michael
parents: 157
diff changeset
2563 +(*(tempBluredPast-1))+ (*(tempBluredPast+1))
d1a4f4ca7178 temp denoiser:
michael
parents: 157
diff changeset
2564 +(*(tempBluredPast+256))
d1a4f4ca7178 temp denoiser:
michael
parents: 157
diff changeset
2565 +4)>>3;
d1a4f4ca7178 temp denoiser:
michael
parents: 157
diff changeset
2566 *tempBluredPast=i;
d1a4f4ca7178 temp denoiser:
michael
parents: 157
diff changeset
2567 // ((*tempBluredPast)*3 + d + 2)>>2;
d1a4f4ca7178 temp denoiser:
michael
parents: 157
diff changeset
2568
156
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2569 //printf("%d %d %d\n", maxNoise[0], maxNoise[1], maxNoise[2]);
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2570 /*
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2571 Switch between
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2572 1 0 0 0 0 0 0 (0)
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2573 64 32 16 8 4 2 1 (1)
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2574 64 48 36 27 20 15 11 (33) (approx)
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2575 64 56 49 43 37 33 29 (200) (approx)
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2576 */
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2577 if(d > maxNoise[1])
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2578 {
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2579 if(d < maxNoise[2])
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2580 {
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2581 for(y=0; y<8; y++)
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2582 {
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2583 int x;
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2584 for(x=0; x<8; x++)
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2585 {
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2586 int ref= tempBlured[ x + y*stride ];
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2587 int cur= src[ x + y*stride ];
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2588 tempBlured[ x + y*stride ]=
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2589 src[ x + y*stride ]=
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2590 (ref + cur + 1)>>1;
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2591 }
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2592 }
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2593 }
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2594 else
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2595 {
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2596 for(y=0; y<8; y++)
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2597 {
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2598 int x;
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2599 for(x=0; x<8; x++)
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2600 {
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2601 tempBlured[ x + y*stride ]= src[ x + y*stride ];
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2602 }
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2603 }
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2604 }
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2605 }
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2606 else
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2607 {
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2608 if(d < maxNoise[0])
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2609 {
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2610 for(y=0; y<8; y++)
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2611 {
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2612 int x;
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2613 for(x=0; x<8; x++)
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2614 {
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2615 int ref= tempBlured[ x + y*stride ];
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2616 int cur= src[ x + y*stride ];
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2617 tempBlured[ x + y*stride ]=
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2618 src[ x + y*stride ]=
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2619 (ref*7 + cur + 4)>>3;
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2620 }
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2621 }
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2622 }
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2623 else
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2624 {
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2625 for(y=0; y<8; y++)
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2626 {
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2627 int x;
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2628 for(x=0; x<8; x++)
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2629 {
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2630 int ref= tempBlured[ x + y*stride ];
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2631 int cur= src[ x + y*stride ];
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2632 tempBlured[ x + y*stride ]=
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2633 src[ x + y*stride ]=
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2634 (ref*3 + cur + 2)>>2;
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2635 }
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2636 }
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2637 }
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2638 }
788
425d71e81c37 fix compilation on non-x86 with gcc 2.95
colin
parents: 787
diff changeset
2639 }
157
bc12fd7e6153 temp denoiser changes: (a-b)^2 instead of |a-b| and MMX2/3DNOW version
michael
parents: 156
diff changeset
2640 #endif
156
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2641 }
2041
b996fbe0a7e7 Newer version, using a vectorized version of the
michael
parents: 2040
diff changeset
2642 #endif //HAVE_ALTIVEC
156
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
2643
2039
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2644 #ifdef HAVE_MMX
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2645 /**
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2646 * accurate deblock filter
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2647 */
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2648 static always_inline void RENAME(do_a_deblock)(uint8_t *src, int step, int stride, PPContext *c){
2642
240e17c3cb2d GCC4 fix by (Keenan Pepper (keenanpepper gmail com)
michael
parents: 2527
diff changeset
2649 int64_t dc_mask, eq_mask, both_masks;
2040
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2650 int64_t sums[10*8*2];
2039
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2651 src+= step*3; // src points to begin of the 8x8 Block
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2652 //START_TIMER
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2653 asm volatile(
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2654 "movq %0, %%mm7 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2655 "movq %1, %%mm6 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2656 : : "m" (c->mmxDcOffset[c->nonBQP]), "m" (c->mmxDcThreshold[c->nonBQP])
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2657 );
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2658
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2659 asm volatile(
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2660 "lea (%2, %3), %%"REG_a" \n\t"
2039
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2661 // 0 1 2 3 4 5 6 7 8 9
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2662 // %1 eax eax+%2 eax+2%2 %1+4%2 ecx ecx+%2 ecx+2%2 %1+8%2 ecx+4%2
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2663
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2664 "movq (%2), %%mm0 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2665 "movq (%%"REG_a"), %%mm1 \n\t"
2039
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2666 "movq %%mm1, %%mm3 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2667 "movq %%mm1, %%mm4 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2668 "psubb %%mm1, %%mm0 \n\t" // mm0 = differnece
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2669 "paddb %%mm7, %%mm0 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2670 "pcmpgtb %%mm6, %%mm0 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2671
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2672 "movq (%%"REG_a",%3), %%mm2 \n\t"
2039
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2673 PMAXUB(%%mm2, %%mm4)
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2674 PMINUB(%%mm2, %%mm3, %%mm5)
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2675 "psubb %%mm2, %%mm1 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2676 "paddb %%mm7, %%mm1 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2677 "pcmpgtb %%mm6, %%mm1 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2678 "paddb %%mm1, %%mm0 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2679
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2680 "movq (%%"REG_a", %3, 2), %%mm1 \n\t"
2039
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2681 PMAXUB(%%mm1, %%mm4)
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2682 PMINUB(%%mm1, %%mm3, %%mm5)
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2683 "psubb %%mm1, %%mm2 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2684 "paddb %%mm7, %%mm2 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2685 "pcmpgtb %%mm6, %%mm2 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2686 "paddb %%mm2, %%mm0 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2687
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2688 "lea (%%"REG_a", %3, 4), %%"REG_a" \n\t"
2039
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2689
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2690 "movq (%2, %3, 4), %%mm2 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2691 PMAXUB(%%mm2, %%mm4)
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2692 PMINUB(%%mm2, %%mm3, %%mm5)
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2693 "psubb %%mm2, %%mm1 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2694 "paddb %%mm7, %%mm1 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2695 "pcmpgtb %%mm6, %%mm1 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2696 "paddb %%mm1, %%mm0 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2697
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2698 "movq (%%"REG_a"), %%mm1 \n\t"
2039
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2699 PMAXUB(%%mm1, %%mm4)
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2700 PMINUB(%%mm1, %%mm3, %%mm5)
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2701 "psubb %%mm1, %%mm2 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2702 "paddb %%mm7, %%mm2 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2703 "pcmpgtb %%mm6, %%mm2 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2704 "paddb %%mm2, %%mm0 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2705
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2706 "movq (%%"REG_a", %3), %%mm2 \n\t"
2039
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2707 PMAXUB(%%mm2, %%mm4)
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2708 PMINUB(%%mm2, %%mm3, %%mm5)
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2709 "psubb %%mm2, %%mm1 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2710 "paddb %%mm7, %%mm1 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2711 "pcmpgtb %%mm6, %%mm1 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2712 "paddb %%mm1, %%mm0 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2713
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2714 "movq (%%"REG_a", %3, 2), %%mm1 \n\t"
2039
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2715 PMAXUB(%%mm1, %%mm4)
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2716 PMINUB(%%mm1, %%mm3, %%mm5)
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2717 "psubb %%mm1, %%mm2 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2718 "paddb %%mm7, %%mm2 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2719 "pcmpgtb %%mm6, %%mm2 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2720 "paddb %%mm2, %%mm0 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2721
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2722 "movq (%2, %3, 8), %%mm2 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2723 PMAXUB(%%mm2, %%mm4)
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2724 PMINUB(%%mm2, %%mm3, %%mm5)
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2725 "psubb %%mm2, %%mm1 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2726 "paddb %%mm7, %%mm1 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2727 "pcmpgtb %%mm6, %%mm1 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2728 "paddb %%mm1, %%mm0 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2729
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2730 "movq (%%"REG_a", %3, 4), %%mm1 \n\t"
2039
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2731 "psubb %%mm1, %%mm2 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2732 "paddb %%mm7, %%mm2 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2733 "pcmpgtb %%mm6, %%mm2 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2734 "paddb %%mm2, %%mm0 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2735 "psubusb %%mm3, %%mm4 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2736
2276
185f3b18ec1f 100l (signed vs. unsigend)
michael
parents: 2043
diff changeset
2737 "pxor %%mm6, %%mm6 \n\t"
2039
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2738 "movq %4, %%mm7 \n\t" // QP,..., QP
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2739 "paddusb %%mm7, %%mm7 \n\t" // 2QP ... 2QP
2276
185f3b18ec1f 100l (signed vs. unsigend)
michael
parents: 2043
diff changeset
2740 "psubusb %%mm4, %%mm7 \n\t" // Diff >=2QP -> 0
185f3b18ec1f 100l (signed vs. unsigend)
michael
parents: 2043
diff changeset
2741 "pcmpeqb %%mm6, %%mm7 \n\t" // Diff < 2QP -> 0
185f3b18ec1f 100l (signed vs. unsigend)
michael
parents: 2043
diff changeset
2742 "pcmpeqb %%mm6, %%mm7 \n\t" // Diff < 2QP -> 0
2039
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2743 "movq %%mm7, %1 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2744
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2745 "movq %5, %%mm7 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2746 "punpcklbw %%mm7, %%mm7 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2747 "punpcklbw %%mm7, %%mm7 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2748 "punpcklbw %%mm7, %%mm7 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2749 "psubb %%mm0, %%mm6 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2750 "pcmpgtb %%mm7, %%mm6 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2751 "movq %%mm6, %0 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2752
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2753 : "=m" (eq_mask), "=m" (dc_mask)
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2754 : "r" (src), "r" ((long)step), "m" (c->pQPb), "m"(c->ppMode.flatnessThreshold)
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2755 : "%"REG_a
2039
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2756 );
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2757
2642
240e17c3cb2d GCC4 fix by (Keenan Pepper (keenanpepper gmail com)
michael
parents: 2527
diff changeset
2758 both_masks = dc_mask & eq_mask;
240e17c3cb2d GCC4 fix by (Keenan Pepper (keenanpepper gmail com)
michael
parents: 2527
diff changeset
2759
240e17c3cb2d GCC4 fix by (Keenan Pepper (keenanpepper gmail com)
michael
parents: 2527
diff changeset
2760 if(both_masks){
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2761 long offset= -8*step;
2040
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2762 int64_t *temp_sums= sums;
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2763
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2764 asm volatile(
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2765 "movq %2, %%mm0 \n\t" // QP,..., QP
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2766 "pxor %%mm4, %%mm4 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2767
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2768 "movq (%0), %%mm6 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2769 "movq (%0, %1), %%mm5 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2770 "movq %%mm5, %%mm1 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2771 "movq %%mm6, %%mm2 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2772 "psubusb %%mm6, %%mm5 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2773 "psubusb %%mm1, %%mm2 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2774 "por %%mm5, %%mm2 \n\t" // ABS Diff of lines
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2775 "psubusb %%mm2, %%mm0 \n\t" // diff >= QP -> 0
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2776 "pcmpeqb %%mm4, %%mm0 \n\t" // diff >= QP -> FF
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2777
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2778 "pxor %%mm6, %%mm1 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2779 "pand %%mm0, %%mm1 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2780 "pxor %%mm1, %%mm6 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2781 // 0:QP 6:First
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2782
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2783 "movq (%0, %1, 8), %%mm5 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2784 "add %1, %0 \n\t" // %0 points to line 1 not 0
2040
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2785 "movq (%0, %1, 8), %%mm7 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2786 "movq %%mm5, %%mm1 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2787 "movq %%mm7, %%mm2 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2788 "psubusb %%mm7, %%mm5 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2789 "psubusb %%mm1, %%mm2 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2790 "por %%mm5, %%mm2 \n\t" // ABS Diff of lines
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2791 "movq %2, %%mm0 \n\t" // QP,..., QP
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2792 "psubusb %%mm2, %%mm0 \n\t" // diff >= QP -> 0
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2793 "pcmpeqb %%mm4, %%mm0 \n\t" // diff >= QP -> FF
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2794
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2795 "pxor %%mm7, %%mm1 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2796 "pand %%mm0, %%mm1 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2797 "pxor %%mm1, %%mm7 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2798
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2799 "movq %%mm6, %%mm5 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2800 "punpckhbw %%mm4, %%mm6 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2801 "punpcklbw %%mm4, %%mm5 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2802 // 4:0 5/6:First 7:Last
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2803
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2804 "movq %%mm5, %%mm0 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2805 "movq %%mm6, %%mm1 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2806 "psllw $2, %%mm0 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2807 "psllw $2, %%mm1 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2808 "paddw "MANGLE(w04)", %%mm0 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2809 "paddw "MANGLE(w04)", %%mm1 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2810
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2811 #define NEXT\
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2812 "movq (%0), %%mm2 \n\t"\
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2813 "movq (%0), %%mm3 \n\t"\
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2814 "add %1, %0 \n\t"\
2040
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2815 "punpcklbw %%mm4, %%mm2 \n\t"\
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2816 "punpckhbw %%mm4, %%mm3 \n\t"\
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2817 "paddw %%mm2, %%mm0 \n\t"\
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2818 "paddw %%mm3, %%mm1 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2819
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2820 #define PREV\
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2821 "movq (%0), %%mm2 \n\t"\
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2822 "movq (%0), %%mm3 \n\t"\
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2823 "add %1, %0 \n\t"\
2040
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2824 "punpcklbw %%mm4, %%mm2 \n\t"\
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2825 "punpckhbw %%mm4, %%mm3 \n\t"\
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2826 "psubw %%mm2, %%mm0 \n\t"\
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2827 "psubw %%mm3, %%mm1 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2828
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2829
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2830 NEXT //0
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2831 NEXT //1
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2832 NEXT //2
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2833 "movq %%mm0, (%3) \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2834 "movq %%mm1, 8(%3) \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2835
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2836 NEXT //3
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2837 "psubw %%mm5, %%mm0 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2838 "psubw %%mm6, %%mm1 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2839 "movq %%mm0, 16(%3) \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2840 "movq %%mm1, 24(%3) \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2841
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2842 NEXT //4
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2843 "psubw %%mm5, %%mm0 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2844 "psubw %%mm6, %%mm1 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2845 "movq %%mm0, 32(%3) \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2846 "movq %%mm1, 40(%3) \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2847
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2848 NEXT //5
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2849 "psubw %%mm5, %%mm0 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2850 "psubw %%mm6, %%mm1 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2851 "movq %%mm0, 48(%3) \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2852 "movq %%mm1, 56(%3) \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2853
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2854 NEXT //6
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2855 "psubw %%mm5, %%mm0 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2856 "psubw %%mm6, %%mm1 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2857 "movq %%mm0, 64(%3) \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2858 "movq %%mm1, 72(%3) \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2859
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2860 "movq %%mm7, %%mm6 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2861 "punpckhbw %%mm4, %%mm7 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2862 "punpcklbw %%mm4, %%mm6 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2863
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2864 NEXT //7
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2865 "mov %4, %0 \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2866 "add %1, %0 \n\t"
2040
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2867 PREV //0
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2868 "movq %%mm0, 80(%3) \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2869 "movq %%mm1, 88(%3) \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2870
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2871 PREV //1
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2872 "paddw %%mm6, %%mm0 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2873 "paddw %%mm7, %%mm1 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2874 "movq %%mm0, 96(%3) \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2875 "movq %%mm1, 104(%3) \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2876
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2877 PREV //2
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2878 "paddw %%mm6, %%mm0 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2879 "paddw %%mm7, %%mm1 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2880 "movq %%mm0, 112(%3) \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2881 "movq %%mm1, 120(%3) \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2882
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2883 PREV //3
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2884 "paddw %%mm6, %%mm0 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2885 "paddw %%mm7, %%mm1 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2886 "movq %%mm0, 128(%3) \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2887 "movq %%mm1, 136(%3) \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2888
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2889 PREV //4
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2890 "paddw %%mm6, %%mm0 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2891 "paddw %%mm7, %%mm1 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2892 "movq %%mm0, 144(%3) \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2893 "movq %%mm1, 152(%3) \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2894
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2895 "mov %4, %0 \n\t" //FIXME
2040
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2896
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2897 : "+&r"(src)
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2898 : "r" ((long)step), "m" (c->pQPb), "r"(sums), "g"(src)
2040
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2899 );
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2900
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2901 src+= step; // src points to begin of the 8x8 Block
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2902
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2903 asm volatile(
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2904 "movq %4, %%mm6 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2905 "pcmpeqb %%mm5, %%mm5 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2906 "pxor %%mm6, %%mm5 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2907 "pxor %%mm7, %%mm7 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2908
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2909 "1: \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2910 "movq (%1), %%mm0 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2911 "movq 8(%1), %%mm1 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2912 "paddw 32(%1), %%mm0 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2913 "paddw 40(%1), %%mm1 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2914 "movq (%0, %3), %%mm2 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2915 "movq %%mm2, %%mm3 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2916 "movq %%mm2, %%mm4 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2917 "punpcklbw %%mm7, %%mm2 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2918 "punpckhbw %%mm7, %%mm3 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2919 "paddw %%mm2, %%mm0 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2920 "paddw %%mm3, %%mm1 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2921 "paddw %%mm2, %%mm0 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2922 "paddw %%mm3, %%mm1 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2923 "psrlw $4, %%mm0 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2924 "psrlw $4, %%mm1 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2925 "packuswb %%mm1, %%mm0 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2926 "pand %%mm6, %%mm0 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2927 "pand %%mm5, %%mm4 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2928 "por %%mm4, %%mm0 \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2929 "movq %%mm0, (%0, %3) \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2930 "add $16, %1 \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2931 "add %2, %0 \n\t"
2040
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2932 " js 1b \n\t"
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2933
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2934 : "+r"(offset), "+r"(temp_sums)
2642
240e17c3cb2d GCC4 fix by (Keenan Pepper (keenanpepper gmail com)
michael
parents: 2527
diff changeset
2935 : "r" ((long)step), "r"(src - offset), "m"(both_masks)
2040
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2936 );
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2937 }else
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2938 src+= step; // src points to begin of the 8x8 Block
2039
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2939
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2940 if(eq_mask != -1LL){
2040
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
2941 uint8_t *temp_src= src;
2039
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2942 asm volatile(
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2943 "pxor %%mm7, %%mm7 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2944 "lea -40(%%"REG_SP"), %%"REG_c" \n\t" // make space for 4 8-byte vars
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2945 "and "ALIGN_MASK", %%"REG_c" \n\t" // align
2039
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2946 // 0 1 2 3 4 5 6 7 8 9
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2947 // %0 eax eax+%1 eax+2%1 %0+4%1 ecx ecx+%1 ecx+2%1 %1+8%1 ecx+4%1
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2948
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2949 "movq (%0), %%mm0 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2950 "movq %%mm0, %%mm1 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2951 "punpcklbw %%mm7, %%mm0 \n\t" // low part of line 0
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2952 "punpckhbw %%mm7, %%mm1 \n\t" // high part of line 0
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2953
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2954 "movq (%0, %1), %%mm2 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2955 "lea (%0, %1, 2), %%"REG_a" \n\t"
2039
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2956 "movq %%mm2, %%mm3 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2957 "punpcklbw %%mm7, %%mm2 \n\t" // low part of line 1
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2958 "punpckhbw %%mm7, %%mm3 \n\t" // high part of line 1
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2959
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2960 "movq (%%"REG_a"), %%mm4 \n\t"
2039
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2961 "movq %%mm4, %%mm5 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2962 "punpcklbw %%mm7, %%mm4 \n\t" // low part of line 2
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2963 "punpckhbw %%mm7, %%mm5 \n\t" // high part of line 2
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2964
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2965 "paddw %%mm0, %%mm0 \n\t" // 2L0
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2966 "paddw %%mm1, %%mm1 \n\t" // 2H0
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2967 "psubw %%mm4, %%mm2 \n\t" // L1 - L2
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2968 "psubw %%mm5, %%mm3 \n\t" // H1 - H2
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2969 "psubw %%mm2, %%mm0 \n\t" // 2L0 - L1 + L2
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2970 "psubw %%mm3, %%mm1 \n\t" // 2H0 - H1 + H2
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2971
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2972 "psllw $2, %%mm2 \n\t" // 4L1 - 4L2
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2973 "psllw $2, %%mm3 \n\t" // 4H1 - 4H2
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2974 "psubw %%mm2, %%mm0 \n\t" // 2L0 - 5L1 + 5L2
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2975 "psubw %%mm3, %%mm1 \n\t" // 2H0 - 5H1 + 5H2
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2976
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2977 "movq (%%"REG_a", %1), %%mm2 \n\t"
2039
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2978 "movq %%mm2, %%mm3 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2979 "punpcklbw %%mm7, %%mm2 \n\t" // L3
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2980 "punpckhbw %%mm7, %%mm3 \n\t" // H3
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2981
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2982 "psubw %%mm2, %%mm0 \n\t" // 2L0 - 5L1 + 5L2 - L3
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2983 "psubw %%mm3, %%mm1 \n\t" // 2H0 - 5H1 + 5H2 - H3
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2984 "psubw %%mm2, %%mm0 \n\t" // 2L0 - 5L1 + 5L2 - 2L3
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2985 "psubw %%mm3, %%mm1 \n\t" // 2H0 - 5H1 + 5H2 - 2H3
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2986 "movq %%mm0, (%%"REG_c") \n\t" // 2L0 - 5L1 + 5L2 - 2L3
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2987 "movq %%mm1, 8(%%"REG_c") \n\t" // 2H0 - 5H1 + 5H2 - 2H3
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2988
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2989 "movq (%%"REG_a", %1, 2), %%mm0 \n\t"
2039
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2990 "movq %%mm0, %%mm1 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2991 "punpcklbw %%mm7, %%mm0 \n\t" // L4
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2992 "punpckhbw %%mm7, %%mm1 \n\t" // H4
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2993
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2994 "psubw %%mm0, %%mm2 \n\t" // L3 - L4
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2995 "psubw %%mm1, %%mm3 \n\t" // H3 - H4
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2996 "movq %%mm2, 16(%%"REG_c") \n\t" // L3 - L4
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
2997 "movq %%mm3, 24(%%"REG_c") \n\t" // H3 - H4
2039
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2998 "paddw %%mm4, %%mm4 \n\t" // 2L2
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
2999 "paddw %%mm5, %%mm5 \n\t" // 2H2
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3000 "psubw %%mm2, %%mm4 \n\t" // 2L2 - L3 + L4
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3001 "psubw %%mm3, %%mm5 \n\t" // 2H2 - H3 + H4
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3002
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3003 "lea (%%"REG_a", %1), %0 \n\t"
2039
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3004 "psllw $2, %%mm2 \n\t" // 4L3 - 4L4
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3005 "psllw $2, %%mm3 \n\t" // 4H3 - 4H4
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3006 "psubw %%mm2, %%mm4 \n\t" // 2L2 - 5L3 + 5L4
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3007 "psubw %%mm3, %%mm5 \n\t" // 2H2 - 5H3 + 5H4
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3008 //50 opcodes so far
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3009 "movq (%0, %1, 2), %%mm2 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3010 "movq %%mm2, %%mm3 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3011 "punpcklbw %%mm7, %%mm2 \n\t" // L5
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3012 "punpckhbw %%mm7, %%mm3 \n\t" // H5
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3013 "psubw %%mm2, %%mm4 \n\t" // 2L2 - 5L3 + 5L4 - L5
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3014 "psubw %%mm3, %%mm5 \n\t" // 2H2 - 5H3 + 5H4 - H5
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3015 "psubw %%mm2, %%mm4 \n\t" // 2L2 - 5L3 + 5L4 - 2L5
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3016 "psubw %%mm3, %%mm5 \n\t" // 2H2 - 5H3 + 5H4 - 2H5
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3017
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3018 "movq (%%"REG_a", %1, 4), %%mm6 \n\t"
2039
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3019 "punpcklbw %%mm7, %%mm6 \n\t" // L6
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3020 "psubw %%mm6, %%mm2 \n\t" // L5 - L6
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3021 "movq (%%"REG_a", %1, 4), %%mm6 \n\t"
2039
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3022 "punpckhbw %%mm7, %%mm6 \n\t" // H6
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3023 "psubw %%mm6, %%mm3 \n\t" // H5 - H6
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3024
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3025 "paddw %%mm0, %%mm0 \n\t" // 2L4
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3026 "paddw %%mm1, %%mm1 \n\t" // 2H4
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3027 "psubw %%mm2, %%mm0 \n\t" // 2L4 - L5 + L6
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3028 "psubw %%mm3, %%mm1 \n\t" // 2H4 - H5 + H6
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3029
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3030 "psllw $2, %%mm2 \n\t" // 4L5 - 4L6
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3031 "psllw $2, %%mm3 \n\t" // 4H5 - 4H6
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3032 "psubw %%mm2, %%mm0 \n\t" // 2L4 - 5L5 + 5L6
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3033 "psubw %%mm3, %%mm1 \n\t" // 2H4 - 5H5 + 5H6
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3034
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3035 "movq (%0, %1, 4), %%mm2 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3036 "movq %%mm2, %%mm3 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3037 "punpcklbw %%mm7, %%mm2 \n\t" // L7
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3038 "punpckhbw %%mm7, %%mm3 \n\t" // H7
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3039
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3040 "paddw %%mm2, %%mm2 \n\t" // 2L7
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3041 "paddw %%mm3, %%mm3 \n\t" // 2H7
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3042 "psubw %%mm2, %%mm0 \n\t" // 2L4 - 5L5 + 5L6 - 2L7
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3043 "psubw %%mm3, %%mm1 \n\t" // 2H4 - 5H5 + 5H6 - 2H7
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3044
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3045 "movq (%%"REG_c"), %%mm2 \n\t" // 2L0 - 5L1 + 5L2 - 2L3
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3046 "movq 8(%%"REG_c"), %%mm3 \n\t" // 2H0 - 5H1 + 5H2 - 2H3
2039
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3047
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3048 #ifdef HAVE_MMX2
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3049 "movq %%mm7, %%mm6 \n\t" // 0
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3050 "psubw %%mm0, %%mm6 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3051 "pmaxsw %%mm6, %%mm0 \n\t" // |2L4 - 5L5 + 5L6 - 2L7|
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3052 "movq %%mm7, %%mm6 \n\t" // 0
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3053 "psubw %%mm1, %%mm6 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3054 "pmaxsw %%mm6, %%mm1 \n\t" // |2H4 - 5H5 + 5H6 - 2H7|
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3055 "movq %%mm7, %%mm6 \n\t" // 0
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3056 "psubw %%mm2, %%mm6 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3057 "pmaxsw %%mm6, %%mm2 \n\t" // |2L0 - 5L1 + 5L2 - 2L3|
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3058 "movq %%mm7, %%mm6 \n\t" // 0
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3059 "psubw %%mm3, %%mm6 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3060 "pmaxsw %%mm6, %%mm3 \n\t" // |2H0 - 5H1 + 5H2 - 2H3|
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3061 #else
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3062 "movq %%mm7, %%mm6 \n\t" // 0
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3063 "pcmpgtw %%mm0, %%mm6 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3064 "pxor %%mm6, %%mm0 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3065 "psubw %%mm6, %%mm0 \n\t" // |2L4 - 5L5 + 5L6 - 2L7|
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3066 "movq %%mm7, %%mm6 \n\t" // 0
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3067 "pcmpgtw %%mm1, %%mm6 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3068 "pxor %%mm6, %%mm1 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3069 "psubw %%mm6, %%mm1 \n\t" // |2H4 - 5H5 + 5H6 - 2H7|
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3070 "movq %%mm7, %%mm6 \n\t" // 0
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3071 "pcmpgtw %%mm2, %%mm6 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3072 "pxor %%mm6, %%mm2 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3073 "psubw %%mm6, %%mm2 \n\t" // |2L0 - 5L1 + 5L2 - 2L3|
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3074 "movq %%mm7, %%mm6 \n\t" // 0
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3075 "pcmpgtw %%mm3, %%mm6 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3076 "pxor %%mm6, %%mm3 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3077 "psubw %%mm6, %%mm3 \n\t" // |2H0 - 5H1 + 5H2 - 2H3|
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3078 #endif
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3079
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3080 #ifdef HAVE_MMX2
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3081 "pminsw %%mm2, %%mm0 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3082 "pminsw %%mm3, %%mm1 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3083 #else
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3084 "movq %%mm0, %%mm6 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3085 "psubusw %%mm2, %%mm6 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3086 "psubw %%mm6, %%mm0 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3087 "movq %%mm1, %%mm6 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3088 "psubusw %%mm3, %%mm6 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3089 "psubw %%mm6, %%mm1 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3090 #endif
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3091
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3092 "movd %2, %%mm2 \n\t" // QP
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3093 "punpcklbw %%mm7, %%mm2 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3094
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3095 "movq %%mm7, %%mm6 \n\t" // 0
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3096 "pcmpgtw %%mm4, %%mm6 \n\t" // sign(2L2 - 5L3 + 5L4 - 2L5)
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3097 "pxor %%mm6, %%mm4 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3098 "psubw %%mm6, %%mm4 \n\t" // |2L2 - 5L3 + 5L4 - 2L5|
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3099 "pcmpgtw %%mm5, %%mm7 \n\t" // sign(2H2 - 5H3 + 5H4 - 2H5)
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3100 "pxor %%mm7, %%mm5 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3101 "psubw %%mm7, %%mm5 \n\t" // |2H2 - 5H3 + 5H4 - 2H5|
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3102 // 100 opcodes
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3103 "psllw $3, %%mm2 \n\t" // 8QP
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3104 "movq %%mm2, %%mm3 \n\t" // 8QP
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3105 "pcmpgtw %%mm4, %%mm2 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3106 "pcmpgtw %%mm5, %%mm3 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3107 "pand %%mm2, %%mm4 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3108 "pand %%mm3, %%mm5 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3109
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3110
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3111 "psubusw %%mm0, %%mm4 \n\t" // hd
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3112 "psubusw %%mm1, %%mm5 \n\t" // ld
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3113
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3114
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3115 "movq "MANGLE(w05)", %%mm2 \n\t" // 5
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3116 "pmullw %%mm2, %%mm4 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3117 "pmullw %%mm2, %%mm5 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3118 "movq "MANGLE(w20)", %%mm2 \n\t" // 32
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3119 "paddw %%mm2, %%mm4 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3120 "paddw %%mm2, %%mm5 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3121 "psrlw $6, %%mm4 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3122 "psrlw $6, %%mm5 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3123
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3124 "movq 16(%%"REG_c"), %%mm0 \n\t" // L3 - L4
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3125 "movq 24(%%"REG_c"), %%mm1 \n\t" // H3 - H4
2039
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3126
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3127 "pxor %%mm2, %%mm2 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3128 "pxor %%mm3, %%mm3 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3129
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3130 "pcmpgtw %%mm0, %%mm2 \n\t" // sign (L3-L4)
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3131 "pcmpgtw %%mm1, %%mm3 \n\t" // sign (H3-H4)
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3132 "pxor %%mm2, %%mm0 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3133 "pxor %%mm3, %%mm1 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3134 "psubw %%mm2, %%mm0 \n\t" // |L3-L4|
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3135 "psubw %%mm3, %%mm1 \n\t" // |H3-H4|
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3136 "psrlw $1, %%mm0 \n\t" // |L3 - L4|/2
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3137 "psrlw $1, %%mm1 \n\t" // |H3 - H4|/2
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3138
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3139 "pxor %%mm6, %%mm2 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3140 "pxor %%mm7, %%mm3 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3141 "pand %%mm2, %%mm4 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3142 "pand %%mm3, %%mm5 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3143
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3144 #ifdef HAVE_MMX2
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3145 "pminsw %%mm0, %%mm4 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3146 "pminsw %%mm1, %%mm5 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3147 #else
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3148 "movq %%mm4, %%mm2 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3149 "psubusw %%mm0, %%mm2 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3150 "psubw %%mm2, %%mm4 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3151 "movq %%mm5, %%mm2 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3152 "psubusw %%mm1, %%mm2 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3153 "psubw %%mm2, %%mm5 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3154 #endif
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3155 "pxor %%mm6, %%mm4 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3156 "pxor %%mm7, %%mm5 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3157 "psubw %%mm6, %%mm4 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3158 "psubw %%mm7, %%mm5 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3159 "packsswb %%mm5, %%mm4 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3160 "movq %3, %%mm1 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3161 "pandn %%mm4, %%mm1 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3162 "movq (%0), %%mm0 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3163 "paddb %%mm1, %%mm0 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3164 "movq %%mm0, (%0) \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3165 "movq (%0, %1), %%mm0 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3166 "psubb %%mm1, %%mm0 \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3167 "movq %%mm0, (%0, %1) \n\t"
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3168
2040
5de466b3360e per line lowpass filter in mmx
michael
parents: 2039
diff changeset
3169 : "+r" (temp_src)
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3170 : "r" ((long)step), "m" (c->pQPb), "m"(eq_mask)
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3171 : "%"REG_a, "%"REG_c
2039
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3172 );
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3173 }
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3174 /*if(step==16){
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3175 STOP_TIMER("step16")
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3176 }else{
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3177 STOP_TIMER("stepX")
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3178 }*/
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3179 }
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3180 #endif //HAVE_MMX
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3181
169
20bcd5b70886 runtime cpu detection
michael
parents: 168
diff changeset
3182 static void RENAME(postProcess)(uint8_t src[], int srcStride, uint8_t dst[], int dstStride, int width, int height,
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3183 QP_STORE_T QPs[], int QPStride, int isColor, PPContext *c);
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
3184
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3185 /**
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3186 * Copies a block from src to dst and fixes the blacklevel
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
3187 * levelFix == 0 -> dont touch the brighness & contrast
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3188 */
634
be1cb0e1f276 warning fixes by Dominik Mierzejewski <dominik@rangers.eu.org>
arpi
parents: 600
diff changeset
3189 #undef SCALED_CPY
be1cb0e1f276 warning fixes by Dominik Mierzejewski <dominik@rangers.eu.org>
arpi
parents: 600
diff changeset
3190
169
20bcd5b70886 runtime cpu detection
michael
parents: 168
diff changeset
3191 static inline void RENAME(blockCopy)(uint8_t dst[], int dstStride, uint8_t src[], int srcStride,
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3192 int levelFix, int64_t *packedOffsetAndScale)
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3193 {
129
be35346e27c1 fixed difference with -vo md5 between doVertDefFilter() C and MMX / MMX2 versions
michael
parents: 128
diff changeset
3194 #ifndef HAVE_MMX
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
3195 int i;
129
be35346e27c1 fixed difference with -vo md5 between doVertDefFilter() C and MMX / MMX2 versions
michael
parents: 128
diff changeset
3196 #endif
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
3197 if(levelFix)
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
3198 {
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3199 #ifdef HAVE_MMX
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3200 asm volatile(
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3201 "movq (%%"REG_a"), %%mm2 \n\t" // packedYOffset
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3202 "movq 8(%%"REG_a"), %%mm3 \n\t" // packedYScale
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3203 "lea (%2,%4), %%"REG_a" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3204 "lea (%3,%5), %%"REG_d" \n\t"
101
fcf4e8fcb34b fixed a sig4 bug an non mmx2 cpus (in case of more sig4 errors please send me a "disassemble $eip-16 $eip+16" from gdb)
michael
parents: 100
diff changeset
3205 "pxor %%mm4, %%mm4 \n\t"
173
37eaaa9596cc faster brightness correcture in MMX2
michael
parents: 172
diff changeset
3206 #ifdef HAVE_MMX2
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3207 #define REAL_SCALED_CPY(src1, src2, dst1, dst2) \
173
37eaaa9596cc faster brightness correcture in MMX2
michael
parents: 172
diff changeset
3208 "movq " #src1 ", %%mm0 \n\t"\
37eaaa9596cc faster brightness correcture in MMX2
michael
parents: 172
diff changeset
3209 "movq " #src1 ", %%mm5 \n\t"\
37eaaa9596cc faster brightness correcture in MMX2
michael
parents: 172
diff changeset
3210 "movq " #src2 ", %%mm1 \n\t"\
37eaaa9596cc faster brightness correcture in MMX2
michael
parents: 172
diff changeset
3211 "movq " #src2 ", %%mm6 \n\t"\
37eaaa9596cc faster brightness correcture in MMX2
michael
parents: 172
diff changeset
3212 "punpcklbw %%mm0, %%mm0 \n\t"\
37eaaa9596cc faster brightness correcture in MMX2
michael
parents: 172
diff changeset
3213 "punpckhbw %%mm5, %%mm5 \n\t"\
37eaaa9596cc faster brightness correcture in MMX2
michael
parents: 172
diff changeset
3214 "punpcklbw %%mm1, %%mm1 \n\t"\
37eaaa9596cc faster brightness correcture in MMX2
michael
parents: 172
diff changeset
3215 "punpckhbw %%mm6, %%mm6 \n\t"\
37eaaa9596cc faster brightness correcture in MMX2
michael
parents: 172
diff changeset
3216 "pmulhuw %%mm3, %%mm0 \n\t"\
37eaaa9596cc faster brightness correcture in MMX2
michael
parents: 172
diff changeset
3217 "pmulhuw %%mm3, %%mm5 \n\t"\
37eaaa9596cc faster brightness correcture in MMX2
michael
parents: 172
diff changeset
3218 "pmulhuw %%mm3, %%mm1 \n\t"\
37eaaa9596cc faster brightness correcture in MMX2
michael
parents: 172
diff changeset
3219 "pmulhuw %%mm3, %%mm6 \n\t"\
37eaaa9596cc faster brightness correcture in MMX2
michael
parents: 172
diff changeset
3220 "psubw %%mm2, %%mm0 \n\t"\
37eaaa9596cc faster brightness correcture in MMX2
michael
parents: 172
diff changeset
3221 "psubw %%mm2, %%mm5 \n\t"\
37eaaa9596cc faster brightness correcture in MMX2
michael
parents: 172
diff changeset
3222 "psubw %%mm2, %%mm1 \n\t"\
37eaaa9596cc faster brightness correcture in MMX2
michael
parents: 172
diff changeset
3223 "psubw %%mm2, %%mm6 \n\t"\
37eaaa9596cc faster brightness correcture in MMX2
michael
parents: 172
diff changeset
3224 "packuswb %%mm5, %%mm0 \n\t"\
37eaaa9596cc faster brightness correcture in MMX2
michael
parents: 172
diff changeset
3225 "packuswb %%mm6, %%mm1 \n\t"\
37eaaa9596cc faster brightness correcture in MMX2
michael
parents: 172
diff changeset
3226 "movq %%mm0, " #dst1 " \n\t"\
37eaaa9596cc faster brightness correcture in MMX2
michael
parents: 172
diff changeset
3227 "movq %%mm1, " #dst2 " \n\t"\
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3228
173
37eaaa9596cc faster brightness correcture in MMX2
michael
parents: 172
diff changeset
3229 #else //HAVE_MMX2
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3230 #define REAL_SCALED_CPY(src1, src2, dst1, dst2) \
166
ec349ac7869b 1% speedup
michael
parents: 165
diff changeset
3231 "movq " #src1 ", %%mm0 \n\t"\
ec349ac7869b 1% speedup
michael
parents: 165
diff changeset
3232 "movq " #src1 ", %%mm5 \n\t"\
101
fcf4e8fcb34b fixed a sig4 bug an non mmx2 cpus (in case of more sig4 errors please send me a "disassemble $eip-16 $eip+16" from gdb)
michael
parents: 100
diff changeset
3233 "punpcklbw %%mm4, %%mm0 \n\t"\
fcf4e8fcb34b fixed a sig4 bug an non mmx2 cpus (in case of more sig4 errors please send me a "disassemble $eip-16 $eip+16" from gdb)
michael
parents: 100
diff changeset
3234 "punpckhbw %%mm4, %%mm5 \n\t"\
117
a02f3088b0cf negative black bugfix
michael
parents: 116
diff changeset
3235 "psubw %%mm2, %%mm0 \n\t"\
a02f3088b0cf negative black bugfix
michael
parents: 116
diff changeset
3236 "psubw %%mm2, %%mm5 \n\t"\
166
ec349ac7869b 1% speedup
michael
parents: 165
diff changeset
3237 "movq " #src2 ", %%mm1 \n\t"\
117
a02f3088b0cf negative black bugfix
michael
parents: 116
diff changeset
3238 "psllw $6, %%mm0 \n\t"\
a02f3088b0cf negative black bugfix
michael
parents: 116
diff changeset
3239 "psllw $6, %%mm5 \n\t"\
101
fcf4e8fcb34b fixed a sig4 bug an non mmx2 cpus (in case of more sig4 errors please send me a "disassemble $eip-16 $eip+16" from gdb)
michael
parents: 100
diff changeset
3240 "pmulhw %%mm3, %%mm0 \n\t"\
166
ec349ac7869b 1% speedup
michael
parents: 165
diff changeset
3241 "movq " #src2 ", %%mm6 \n\t"\
101
fcf4e8fcb34b fixed a sig4 bug an non mmx2 cpus (in case of more sig4 errors please send me a "disassemble $eip-16 $eip+16" from gdb)
michael
parents: 100
diff changeset
3242 "pmulhw %%mm3, %%mm5 \n\t"\
fcf4e8fcb34b fixed a sig4 bug an non mmx2 cpus (in case of more sig4 errors please send me a "disassemble $eip-16 $eip+16" from gdb)
michael
parents: 100
diff changeset
3243 "punpcklbw %%mm4, %%mm1 \n\t"\
118
3dd1950ac98d brightness / contrast fix/copy optimizations +2% speedup
michael
parents: 117
diff changeset
3244 "punpckhbw %%mm4, %%mm6 \n\t"\
117
a02f3088b0cf negative black bugfix
michael
parents: 116
diff changeset
3245 "psubw %%mm2, %%mm1 \n\t"\
118
3dd1950ac98d brightness / contrast fix/copy optimizations +2% speedup
michael
parents: 117
diff changeset
3246 "psubw %%mm2, %%mm6 \n\t"\
117
a02f3088b0cf negative black bugfix
michael
parents: 116
diff changeset
3247 "psllw $6, %%mm1 \n\t"\
118
3dd1950ac98d brightness / contrast fix/copy optimizations +2% speedup
michael
parents: 117
diff changeset
3248 "psllw $6, %%mm6 \n\t"\
101
fcf4e8fcb34b fixed a sig4 bug an non mmx2 cpus (in case of more sig4 errors please send me a "disassemble $eip-16 $eip+16" from gdb)
michael
parents: 100
diff changeset
3249 "pmulhw %%mm3, %%mm1 \n\t"\
118
3dd1950ac98d brightness / contrast fix/copy optimizations +2% speedup
michael
parents: 117
diff changeset
3250 "pmulhw %%mm3, %%mm6 \n\t"\
3dd1950ac98d brightness / contrast fix/copy optimizations +2% speedup
michael
parents: 117
diff changeset
3251 "packuswb %%mm5, %%mm0 \n\t"\
3dd1950ac98d brightness / contrast fix/copy optimizations +2% speedup
michael
parents: 117
diff changeset
3252 "packuswb %%mm6, %%mm1 \n\t"\
166
ec349ac7869b 1% speedup
michael
parents: 165
diff changeset
3253 "movq %%mm0, " #dst1 " \n\t"\
ec349ac7869b 1% speedup
michael
parents: 165
diff changeset
3254 "movq %%mm1, " #dst2 " \n\t"\
ec349ac7869b 1% speedup
michael
parents: 165
diff changeset
3255
173
37eaaa9596cc faster brightness correcture in MMX2
michael
parents: 172
diff changeset
3256 #endif //!HAVE_MMX2
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3257 #define SCALED_CPY(src1, src2, dst1, dst2)\
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3258 REAL_SCALED_CPY(src1, src2, dst1, dst2)
173
37eaaa9596cc faster brightness correcture in MMX2
michael
parents: 172
diff changeset
3259
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3260 SCALED_CPY((%2) , (%2, %4) , (%3) , (%3, %5))
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3261 SCALED_CPY((%2, %4, 2), (%%REGa, %4, 2), (%3, %5, 2), (%%REGd, %5, 2))
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3262 SCALED_CPY((%2, %4, 4), (%%REGa, %4, 4), (%3, %5, 4), (%%REGd, %5, 4))
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3263 "lea (%%"REG_a",%4,4), %%"REG_a" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3264 "lea (%%"REG_d",%5,4), %%"REG_d" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3265 SCALED_CPY((%%REGa, %4), (%%REGa, %4, 2), (%%REGd, %5), (%%REGd, %5, 2))
166
ec349ac7869b 1% speedup
michael
parents: 165
diff changeset
3266
ec349ac7869b 1% speedup
michael
parents: 165
diff changeset
3267
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3268 : "=&a" (packedOffsetAndScale)
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3269 : "0" (packedOffsetAndScale),
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3270 "r"(src),
166
ec349ac7869b 1% speedup
michael
parents: 165
diff changeset
3271 "r"(dst),
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3272 "r" ((long)srcStride),
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3273 "r" ((long)dstStride)
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3274 : "%"REG_d
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3275 );
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3276 #else
164
dedb3aef2bee cleanup
michael
parents: 163
diff changeset
3277 for(i=0; i<8; i++)
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3278 memcpy( &(dst[dstStride*i]),
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3279 &(src[srcStride*i]), BLOCK_SIZE);
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3280 #endif
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
3281 }
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
3282 else
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
3283 {
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
3284 #ifdef HAVE_MMX
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
3285 asm volatile(
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3286 "lea (%0,%2), %%"REG_a" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3287 "lea (%1,%3), %%"REG_d" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3288
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3289 #define REAL_SIMPLE_CPY(src1, src2, dst1, dst2) \
166
ec349ac7869b 1% speedup
michael
parents: 165
diff changeset
3290 "movq " #src1 ", %%mm0 \n\t"\
ec349ac7869b 1% speedup
michael
parents: 165
diff changeset
3291 "movq " #src2 ", %%mm1 \n\t"\
ec349ac7869b 1% speedup
michael
parents: 165
diff changeset
3292 "movq %%mm0, " #dst1 " \n\t"\
ec349ac7869b 1% speedup
michael
parents: 165
diff changeset
3293 "movq %%mm1, " #dst2 " \n\t"\
ec349ac7869b 1% speedup
michael
parents: 165
diff changeset
3294
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3295 #define SIMPLE_CPY(src1, src2, dst1, dst2)\
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3296 REAL_SIMPLE_CPY(src1, src2, dst1, dst2)
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3297
166
ec349ac7869b 1% speedup
michael
parents: 165
diff changeset
3298 SIMPLE_CPY((%0) , (%0, %2) , (%1) , (%1, %3))
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3299 SIMPLE_CPY((%0, %2, 2), (%%REGa, %2, 2), (%1, %3, 2), (%%REGd, %3, 2))
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3300 SIMPLE_CPY((%0, %2, 4), (%%REGa, %2, 4), (%1, %3, 4), (%%REGd, %3, 4))
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3301 "lea (%%"REG_a",%2,4), %%"REG_a" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3302 "lea (%%"REG_d",%3,4), %%"REG_d" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3303 SIMPLE_CPY((%%REGa, %2), (%%REGa, %2, 2), (%%REGd, %3), (%%REGd, %3, 2))
166
ec349ac7869b 1% speedup
michael
parents: 165
diff changeset
3304
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
3305 : : "r" (src),
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
3306 "r" (dst),
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3307 "r" ((long)srcStride),
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3308 "r" ((long)dstStride)
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3309 : "%"REG_a, "%"REG_d
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
3310 );
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
3311 #else
164
dedb3aef2bee cleanup
michael
parents: 163
diff changeset
3312 for(i=0; i<8; i++)
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
3313 memcpy( &(dst[dstStride*i]),
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
3314 &(src[srcStride*i]), BLOCK_SIZE);
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
3315 #endif
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
3316 }
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3317 }
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3318
224
8b3e70afa2ba top row bugfix
michael
parents: 223
diff changeset
3319 /**
8b3e70afa2ba top row bugfix
michael
parents: 223
diff changeset
3320 * Duplicates the given 8 src pixels ? times upward
8b3e70afa2ba top row bugfix
michael
parents: 223
diff changeset
3321 */
8b3e70afa2ba top row bugfix
michael
parents: 223
diff changeset
3322 static inline void RENAME(duplicate)(uint8_t src[], int stride)
8b3e70afa2ba top row bugfix
michael
parents: 223
diff changeset
3323 {
8b3e70afa2ba top row bugfix
michael
parents: 223
diff changeset
3324 #ifdef HAVE_MMX
8b3e70afa2ba top row bugfix
michael
parents: 223
diff changeset
3325 asm volatile(
8b3e70afa2ba top row bugfix
michael
parents: 223
diff changeset
3326 "movq (%0), %%mm0 \n\t"
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3327 "add %1, %0 \n\t"
224
8b3e70afa2ba top row bugfix
michael
parents: 223
diff changeset
3328 "movq %%mm0, (%0) \n\t"
8b3e70afa2ba top row bugfix
michael
parents: 223
diff changeset
3329 "movq %%mm0, (%0, %1) \n\t"
8b3e70afa2ba top row bugfix
michael
parents: 223
diff changeset
3330 "movq %%mm0, (%0, %1, 2) \n\t"
8b3e70afa2ba top row bugfix
michael
parents: 223
diff changeset
3331 : "+r" (src)
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3332 : "r" ((long)-stride)
224
8b3e70afa2ba top row bugfix
michael
parents: 223
diff changeset
3333 );
8b3e70afa2ba top row bugfix
michael
parents: 223
diff changeset
3334 #else
8b3e70afa2ba top row bugfix
michael
parents: 223
diff changeset
3335 int i;
8b3e70afa2ba top row bugfix
michael
parents: 223
diff changeset
3336 uint8_t *p=src;
8b3e70afa2ba top row bugfix
michael
parents: 223
diff changeset
3337 for(i=0; i<3; i++)
8b3e70afa2ba top row bugfix
michael
parents: 223
diff changeset
3338 {
8b3e70afa2ba top row bugfix
michael
parents: 223
diff changeset
3339 p-= stride;
8b3e70afa2ba top row bugfix
michael
parents: 223
diff changeset
3340 memcpy(p, src, 8);
8b3e70afa2ba top row bugfix
michael
parents: 223
diff changeset
3341 }
8b3e70afa2ba top row bugfix
michael
parents: 223
diff changeset
3342 #endif
8b3e70afa2ba top row bugfix
michael
parents: 223
diff changeset
3343 }
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3344
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3345 /**
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3346 * Filters array of bytes (Y or U or V values)
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3347 */
169
20bcd5b70886 runtime cpu detection
michael
parents: 168
diff changeset
3348 static void RENAME(postProcess)(uint8_t src[], int srcStride, uint8_t dst[], int dstStride, int width, int height,
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3349 QP_STORE_T QPs[], int QPStride, int isColor, PPContext *c2)
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3350 {
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3351 PPContext __attribute__((aligned(8))) c= *c2; //copy to stack for faster access
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
3352 int x,y;
172
a0efaf471d6b compiletime pp-mode support (luminance = chrominance filters though) 1-2% faster with -benchmark -vo null -nosound
michael
parents: 169
diff changeset
3353 #ifdef COMPILE_TIME_MODE
a0efaf471d6b compiletime pp-mode support (luminance = chrominance filters though) 1-2% faster with -benchmark -vo null -nosound
michael
parents: 169
diff changeset
3354 const int mode= COMPILE_TIME_MODE;
a0efaf471d6b compiletime pp-mode support (luminance = chrominance filters though) 1-2% faster with -benchmark -vo null -nosound
michael
parents: 169
diff changeset
3355 #else
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3356 const int mode= isColor ? c.ppMode.chromMode : c.ppMode.lumMode;
172
a0efaf471d6b compiletime pp-mode support (luminance = chrominance filters though) 1-2% faster with -benchmark -vo null -nosound
michael
parents: 169
diff changeset
3357 #endif
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
3358 int black=0, white=255; // blackest black and whitest white in the picture
223
f0e15c953995 minor QP bugfix
michael
parents: 211
diff changeset
3359 int QPCorrecture= 256*256;
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3360
886
3abff5a87548 warning patch by (Dominik Mierzejewski <dominik at rangers dot eu dot org>)
michael
parents: 810
diff changeset
3361 int copyAhead;
3abff5a87548 warning patch by (Dominik Mierzejewski <dominik at rangers dot eu dot org>)
michael
parents: 810
diff changeset
3362 #ifdef HAVE_MMX
3abff5a87548 warning patch by (Dominik Mierzejewski <dominik at rangers dot eu dot org>)
michael
parents: 810
diff changeset
3363 int i;
3abff5a87548 warning patch by (Dominik Mierzejewski <dominik at rangers dot eu dot org>)
michael
parents: 810
diff changeset
3364 #endif
164
dedb3aef2bee cleanup
michael
parents: 163
diff changeset
3365
957
8a95bda80fdc YUV 411/422/444 support for pp
michael
parents: 944
diff changeset
3366 const int qpHShift= isColor ? 4-c.hChromaSubSample : 4;
8a95bda80fdc YUV 411/422/444 support for pp
michael
parents: 944
diff changeset
3367 const int qpVShift= isColor ? 4-c.vChromaSubSample : 4;
8a95bda80fdc YUV 411/422/444 support for pp
michael
parents: 944
diff changeset
3368
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3369 //FIXME remove
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3370 uint64_t * const yHistogram= c.yHistogram;
2527
ace6e273f318 support for negative strides
henry
parents: 2295
diff changeset
3371 uint8_t * const tempSrc= srcStride > 0 ? c.tempSrc : c.tempSrc - 23*srcStride;
ace6e273f318 support for negative strides
henry
parents: 2295
diff changeset
3372 uint8_t * const tempDst= dstStride > 0 ? c.tempDst : c.tempDst - 23*dstStride;
2031
4225c131a2eb warning fixes by (Michael Roitzsch <mroi at users dot sourceforge dot net>)
michael
parents: 1724
diff changeset
3373 //const int mbWidth= isColor ? (width+7)>>3 : (width+15)>>4;
182
3ccd74a91074 minor brightness/contrast bugfix / moved some global vars into ppMode
michael
parents: 181
diff changeset
3374
158
d1a4f4ca7178 temp denoiser:
michael
parents: 157
diff changeset
3375 #ifdef HAVE_MMX
1724
ea5200a9f730 mpeg2 QP clamping fix
michael
parents: 1581
diff changeset
3376 for(i=0; i<57; i++){
791
4f61ca80b6c1 better deblocking filter
michael
parents: 789
diff changeset
3377 int offset= ((i*c.ppMode.baseDcDiff)>>8) + 1;
4f61ca80b6c1 better deblocking filter
michael
parents: 789
diff changeset
3378 int threshold= offset*2 + 1;
4f61ca80b6c1 better deblocking filter
michael
parents: 789
diff changeset
3379 c.mmxDcOffset[i]= 0x7F - offset;
4f61ca80b6c1 better deblocking filter
michael
parents: 789
diff changeset
3380 c.mmxDcThreshold[i]= 0x7F - threshold;
4f61ca80b6c1 better deblocking filter
michael
parents: 789
diff changeset
3381 c.mmxDcOffset[i]*= 0x0101010101010101LL;
4f61ca80b6c1 better deblocking filter
michael
parents: 789
diff changeset
3382 c.mmxDcThreshold[i]*= 0x0101010101010101LL;
4f61ca80b6c1 better deblocking filter
michael
parents: 789
diff changeset
3383 }
158
d1a4f4ca7178 temp denoiser:
michael
parents: 157
diff changeset
3384 #endif
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3385
164
dedb3aef2bee cleanup
michael
parents: 163
diff changeset
3386 if(mode & CUBIC_IPOL_DEINT_FILTER) copyAhead=16;
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3387 else if( (mode & LINEAR_BLEND_DEINT_FILTER)
1157
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
3388 || (mode & FFMPEG_DEINT_FILTER)
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
3389 || (mode & LOWPASS5_DEINT_FILTER)) copyAhead=14;
164
dedb3aef2bee cleanup
michael
parents: 163
diff changeset
3390 else if( (mode & V_DEBLOCK)
dedb3aef2bee cleanup
michael
parents: 163
diff changeset
3391 || (mode & LINEAR_IPOL_DEINT_FILTER)
2037
98d8283534bb accurate/slow (per line instead of per block) deblock filter spport which is identical to what is recommanded in the mpeg4 spec
michael
parents: 2036
diff changeset
3392 || (mode & MEDIAN_DEINT_FILTER)
98d8283534bb accurate/slow (per line instead of per block) deblock filter spport which is identical to what is recommanded in the mpeg4 spec
michael
parents: 2036
diff changeset
3393 || (mode & V_A_DEBLOCK)) copyAhead=13;
164
dedb3aef2bee cleanup
michael
parents: 163
diff changeset
3394 else if(mode & V_X1_FILTER) copyAhead=11;
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3395 // else if(mode & V_RK1_FILTER) copyAhead=10;
164
dedb3aef2bee cleanup
michael
parents: 163
diff changeset
3396 else if(mode & DERING) copyAhead=9;
dedb3aef2bee cleanup
michael
parents: 163
diff changeset
3397 else copyAhead=8;
dedb3aef2bee cleanup
michael
parents: 163
diff changeset
3398
dedb3aef2bee cleanup
michael
parents: 163
diff changeset
3399 copyAhead-= 8;
dedb3aef2bee cleanup
michael
parents: 163
diff changeset
3400
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3401 if(!isColor)
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3402 {
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3403 uint64_t sum= 0;
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
3404 int i;
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
3405 uint64_t maxClipped;
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
3406 uint64_t clipped;
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
3407 double scale;
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
3408
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3409 c.frameNum++;
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3410 // first frame is fscked so we ignore it
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3411 if(c.frameNum == 1) yHistogram[0]= width*height/64*15/256;
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3412
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
3413 for(i=0; i<256; i++)
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
3414 {
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
3415 sum+= yHistogram[i];
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
3416 // printf("%d ", yHistogram[i]);
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
3417 }
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
3418 // printf("\n\n");
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3419
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
3420 /* we allways get a completly black picture first */
793
8e9faf69110f cleanup
michael
parents: 791
diff changeset
3421 maxClipped= (uint64_t)(sum * c.ppMode.maxClippedThreshold);
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
3422
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
3423 clipped= sum;
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3424 for(black=255; black>0; black--)
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3425 {
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3426 if(clipped < maxClipped) break;
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3427 clipped-= yHistogram[black];
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3428 }
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3429
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3430 clipped= sum;
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3431 for(white=0; white<256; white++)
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3432 {
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3433 if(clipped < maxClipped) break;
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3434 clipped-= yHistogram[white];
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3435 }
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3436
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3437 scale= (double)(c.ppMode.maxAllowedY - c.ppMode.minAllowedY) / (double)(white-black);
173
37eaaa9596cc faster brightness correcture in MMX2
michael
parents: 172
diff changeset
3438
37eaaa9596cc faster brightness correcture in MMX2
michael
parents: 172
diff changeset
3439 #ifdef HAVE_MMX2
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3440 c.packedYScale= (uint16_t)(scale*256.0 + 0.5);
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3441 c.packedYOffset= (((black*c.packedYScale)>>8) - c.ppMode.minAllowedY) & 0xFFFF;
173
37eaaa9596cc faster brightness correcture in MMX2
michael
parents: 172
diff changeset
3442 #else
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3443 c.packedYScale= (uint16_t)(scale*1024.0 + 0.5);
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3444 c.packedYOffset= (black - c.ppMode.minAllowedY) & 0xFFFF;
173
37eaaa9596cc faster brightness correcture in MMX2
michael
parents: 172
diff changeset
3445 #endif
37eaaa9596cc faster brightness correcture in MMX2
michael
parents: 172
diff changeset
3446
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3447 c.packedYOffset|= c.packedYOffset<<32;
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3448 c.packedYOffset|= c.packedYOffset<<16;
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3449
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3450 c.packedYScale|= c.packedYScale<<32;
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3451 c.packedYScale|= c.packedYScale<<16;
223
f0e15c953995 minor QP bugfix
michael
parents: 211
diff changeset
3452
f0e15c953995 minor QP bugfix
michael
parents: 211
diff changeset
3453 if(mode & LEVEL_FIX) QPCorrecture= (int)(scale*256*256 + 0.5);
f0e15c953995 minor QP bugfix
michael
parents: 211
diff changeset
3454 else QPCorrecture= 256*256;
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3455 }
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3456 else
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3457 {
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3458 c.packedYScale= 0x0100010001000100LL;
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3459 c.packedYOffset= 0;
223
f0e15c953995 minor QP bugfix
michael
parents: 211
diff changeset
3460 QPCorrecture= 256*256;
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3461 }
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3462
148
1cfc4d567c0a minor changes (fixed some warnings, added attribute aligned(8) stuff)
michael
parents: 142
diff changeset
3463 /* copy & deinterlace first row of blocks */
142
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3464 y=-BLOCK_SIZE;
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3465 {
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3466 uint8_t *srcBlock= &(src[y*srcStride]);
224
8b3e70afa2ba top row bugfix
michael
parents: 223
diff changeset
3467 uint8_t *dstBlock= tempDst + dstStride;
142
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3468
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3469 // From this point on it is guranteed that we can read and write 16 lines downward
2677
7b7613020f2c remove/replace non-ascii characters
mru
parents: 2642
diff changeset
3470 // finish 1 block before the next otherwise we might have a problem
142
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3471 // with the L1 Cache of the P4 ... or only a few blocks at a time or soemthing
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3472 for(x=0; x<width; x+=BLOCK_SIZE)
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3473 {
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3474
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3475 #ifdef HAVE_MMX2
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3476 /*
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3477 prefetchnta(srcBlock + (((x>>2)&6) + 5)*srcStride + 32);
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3478 prefetchnta(srcBlock + (((x>>2)&6) + 6)*srcStride + 32);
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3479 prefetcht0(dstBlock + (((x>>2)&6) + 5)*dstStride + 32);
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3480 prefetcht0(dstBlock + (((x>>2)&6) + 6)*dstStride + 32);
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3481 */
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3482
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3483 asm(
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3484 "mov %4, %%"REG_a" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3485 "shr $2, %%"REG_a" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3486 "and $6, %%"REG_a" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3487 "add %5, %%"REG_a" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3488 "mov %%"REG_a", %%"REG_d" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3489 "imul %1, %%"REG_a" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3490 "imul %3, %%"REG_d" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3491 "prefetchnta 32(%%"REG_a", %0) \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3492 "prefetcht0 32(%%"REG_d", %2) \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3493 "add %1, %%"REG_a" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3494 "add %3, %%"REG_d" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3495 "prefetchnta 32(%%"REG_a", %0) \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3496 "prefetcht0 32(%%"REG_d", %2) \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3497 :: "r" (srcBlock), "r" ((long)srcStride), "r" (dstBlock), "r" ((long)dstStride),
2767
49da251f2608 GCC4 fix
gpoirier
parents: 2677
diff changeset
3498 "g" ((long)x), "g" ((long)copyAhead)
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3499 : "%"REG_a, "%"REG_d
142
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3500 );
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3501
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3502 #elif defined(HAVE_3DNOW)
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3503 //FIXME check if this is faster on an 3dnow chip or if its faster without the prefetch or ...
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3504 /* prefetch(srcBlock + (((x>>3)&3) + 5)*srcStride + 32);
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3505 prefetch(srcBlock + (((x>>3)&3) + 9)*srcStride + 32);
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3506 prefetchw(dstBlock + (((x>>3)&3) + 5)*dstStride + 32);
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3507 prefetchw(dstBlock + (((x>>3)&3) + 9)*dstStride + 32);
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3508 */
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3509 #endif
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3510
224
8b3e70afa2ba top row bugfix
michael
parents: 223
diff changeset
3511 RENAME(blockCopy)(dstBlock + dstStride*8, dstStride,
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3512 srcBlock + srcStride*8, srcStride, mode & LEVEL_FIX, &c.packedYOffset);
224
8b3e70afa2ba top row bugfix
michael
parents: 223
diff changeset
3513
8b3e70afa2ba top row bugfix
michael
parents: 223
diff changeset
3514 RENAME(duplicate)(dstBlock + dstStride*8, dstStride);
142
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3515
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3516 if(mode & LINEAR_IPOL_DEINT_FILTER)
169
20bcd5b70886 runtime cpu detection
michael
parents: 168
diff changeset
3517 RENAME(deInterlaceInterpolateLinear)(dstBlock, dstStride);
142
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3518 else if(mode & LINEAR_BLEND_DEINT_FILTER)
1581
d2fc92d02bf7 linear blend 1 line shift fix
michael
parents: 1331
diff changeset
3519 RENAME(deInterlaceBlendLinear)(dstBlock, dstStride, c.deintTemp + x);
142
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3520 else if(mode & MEDIAN_DEINT_FILTER)
169
20bcd5b70886 runtime cpu detection
michael
parents: 168
diff changeset
3521 RENAME(deInterlaceMedian)(dstBlock, dstStride);
142
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3522 else if(mode & CUBIC_IPOL_DEINT_FILTER)
169
20bcd5b70886 runtime cpu detection
michael
parents: 168
diff changeset
3523 RENAME(deInterlaceInterpolateCubic)(dstBlock, dstStride);
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3524 else if(mode & FFMPEG_DEINT_FILTER)
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3525 RENAME(deInterlaceFF)(dstBlock, dstStride, c.deintTemp + x);
1157
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
3526 else if(mode & LOWPASS5_DEINT_FILTER)
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
3527 RENAME(deInterlaceL5)(dstBlock, dstStride, c.deintTemp + x, c.deintTemp + width + x);
142
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3528 /* else if(mode & CUBIC_BLEND_DEINT_FILTER)
169
20bcd5b70886 runtime cpu detection
michael
parents: 168
diff changeset
3529 RENAME(deInterlaceBlendCubic)(dstBlock, dstStride);
142
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3530 */
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3531 dstBlock+=8;
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3532 srcBlock+=8;
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3533 }
2527
ace6e273f318 support for negative strides
henry
parents: 2295
diff changeset
3534 if(width==ABS(dstStride))
ace6e273f318 support for negative strides
henry
parents: 2295
diff changeset
3535 linecpy(dst, tempDst + 9*dstStride, copyAhead, dstStride);
941
1e4ab5fdfca1 cleaning corners of green dirt ;)
michael
parents: 886
diff changeset
3536 else
1e4ab5fdfca1 cleaning corners of green dirt ;)
michael
parents: 886
diff changeset
3537 {
943
0566d1a8426f 10l (int i)
michael
parents: 941
diff changeset
3538 int i;
941
1e4ab5fdfca1 cleaning corners of green dirt ;)
michael
parents: 886
diff changeset
3539 for(i=0; i<copyAhead; i++)
1e4ab5fdfca1 cleaning corners of green dirt ;)
michael
parents: 886
diff changeset
3540 {
1e4ab5fdfca1 cleaning corners of green dirt ;)
michael
parents: 886
diff changeset
3541 memcpy(dst + i*dstStride, tempDst + (9+i)*dstStride, width);
1e4ab5fdfca1 cleaning corners of green dirt ;)
michael
parents: 886
diff changeset
3542 }
1e4ab5fdfca1 cleaning corners of green dirt ;)
michael
parents: 886
diff changeset
3543 }
142
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3544 }
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3545
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3546 //printf("\n");
111
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
3547 for(y=0; y<height; y+=BLOCK_SIZE)
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3548 {
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3549 //1% speedup if these are here instead of the inner loop
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3550 uint8_t *srcBlock= &(src[y*srcStride]);
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3551 uint8_t *dstBlock= &(dst[y*dstStride]);
169
20bcd5b70886 runtime cpu detection
michael
parents: 168
diff changeset
3552 #ifdef HAVE_MMX
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3553 uint8_t *tempBlock1= c.tempBlocks;
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3554 uint8_t *tempBlock2= c.tempBlocks + 8;
169
20bcd5b70886 runtime cpu detection
michael
parents: 168
diff changeset
3555 #endif
957
8a95bda80fdc YUV 411/422/444 support for pp
michael
parents: 944
diff changeset
3556 int8_t *QPptr= &QPs[(y>>qpVShift)*QPStride];
2527
ace6e273f318 support for negative strides
henry
parents: 2295
diff changeset
3557 int8_t *nonBQPptr= &c.nonBQPTable[(y>>qpVShift)*ABS(QPStride)];
156
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
3558 int QP=0;
130
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
3559 /* can we mess with a 8x16 block from srcBlock/dstBlock downwards and 1 line upwards
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
3560 if not than use a temporary buffer */
111
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
3561 if(y+15 >= height)
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
3562 {
156
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
3563 int i;
164
dedb3aef2bee cleanup
michael
parents: 163
diff changeset
3564 /* copy from line (copyAhead) to (copyAhead+7) of src, these will be copied with
111
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
3565 blockcopy to dst later */
2527
ace6e273f318 support for negative strides
henry
parents: 2295
diff changeset
3566 linecpy(tempSrc + srcStride*copyAhead, srcBlock + srcStride*copyAhead,
ace6e273f318 support for negative strides
henry
parents: 2295
diff changeset
3567 MAX(height-y-copyAhead, 0), srcStride);
164
dedb3aef2bee cleanup
michael
parents: 163
diff changeset
3568
dedb3aef2bee cleanup
michael
parents: 163
diff changeset
3569 /* duplicate last line of src to fill the void upto line (copyAhead+7) */
dedb3aef2bee cleanup
michael
parents: 163
diff changeset
3570 for(i=MAX(height-y, 8); i<copyAhead+8; i++)
2527
ace6e273f318 support for negative strides
henry
parents: 2295
diff changeset
3571 memcpy(tempSrc + srcStride*i, src + srcStride*(height-1), ABS(srcStride));
156
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
3572
164
dedb3aef2bee cleanup
michael
parents: 163
diff changeset
3573 /* copy up to (copyAhead+1) lines of dst (line -1 to (copyAhead-1))*/
2527
ace6e273f318 support for negative strides
henry
parents: 2295
diff changeset
3574 linecpy(tempDst, dstBlock - dstStride, MIN(height-y+1, copyAhead+1), dstStride);
164
dedb3aef2bee cleanup
michael
parents: 163
diff changeset
3575
dedb3aef2bee cleanup
michael
parents: 163
diff changeset
3576 /* duplicate last line of dst to fill the void upto line (copyAhead) */
dedb3aef2bee cleanup
michael
parents: 163
diff changeset
3577 for(i=height-y+1; i<=copyAhead; i++)
2527
ace6e273f318 support for negative strides
henry
parents: 2295
diff changeset
3578 memcpy(tempDst + dstStride*i, dst + dstStride*(height-1), ABS(dstStride));
156
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
3579
130
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
3580 dstBlock= tempDst + dstStride;
111
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
3581 srcBlock= tempSrc;
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
3582 }
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3583 //printf("\n");
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3584
112
a2c063b6ecf9 fixed a bug in the tmp buffer
michael
parents: 111
diff changeset
3585 // From this point on it is guranteed that we can read and write 16 lines downward
2677
7b7613020f2c remove/replace non-ascii characters
mru
parents: 2642
diff changeset
3586 // finish 1 block before the next otherwise we might have a problem
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3587 // with the L1 Cache of the P4 ... or only a few blocks at a time or soemthing
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
3588 for(x=0; x<width; x+=BLOCK_SIZE)
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3589 {
97
e57b1d38d71f bugfixes: last 3 lines not brightness/contrast corrected
michael
parents: 96
diff changeset
3590 const int stride= dstStride;
169
20bcd5b70886 runtime cpu detection
michael
parents: 168
diff changeset
3591 #ifdef HAVE_MMX
128
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
3592 uint8_t *tmpXchg;
169
20bcd5b70886 runtime cpu detection
michael
parents: 168
diff changeset
3593 #endif
791
4f61ca80b6c1 better deblocking filter
michael
parents: 789
diff changeset
3594 if(isColor)
121
3ecf2a90c65e more speed
michael
parents: 120
diff changeset
3595 {
957
8a95bda80fdc YUV 411/422/444 support for pp
michael
parents: 944
diff changeset
3596 QP= QPptr[x>>qpHShift];
8a95bda80fdc YUV 411/422/444 support for pp
michael
parents: 944
diff changeset
3597 c.nonBQP= nonBQPptr[x>>qpHShift];
791
4f61ca80b6c1 better deblocking filter
michael
parents: 789
diff changeset
3598 }
4f61ca80b6c1 better deblocking filter
michael
parents: 789
diff changeset
3599 else
4f61ca80b6c1 better deblocking filter
michael
parents: 789
diff changeset
3600 {
4f61ca80b6c1 better deblocking filter
michael
parents: 789
diff changeset
3601 QP= QPptr[x>>4];
223
f0e15c953995 minor QP bugfix
michael
parents: 211
diff changeset
3602 QP= (QP* QPCorrecture + 256*128)>>16;
791
4f61ca80b6c1 better deblocking filter
michael
parents: 789
diff changeset
3603 c.nonBQP= nonBQPptr[x>>4];
4f61ca80b6c1 better deblocking filter
michael
parents: 789
diff changeset
3604 c.nonBQP= (c.nonBQP* QPCorrecture + 256*128)>>16;
148
1cfc4d567c0a minor changes (fixed some warnings, added attribute aligned(8) stuff)
michael
parents: 142
diff changeset
3605 yHistogram[ srcBlock[srcStride*12 + 4] ]++;
121
3ecf2a90c65e more speed
michael
parents: 120
diff changeset
3606 }
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3607 c.QP= QP;
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3608 #ifdef HAVE_MMX
111
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
3609 asm volatile(
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3610 "movd %1, %%mm7 \n\t"
111
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
3611 "packuswb %%mm7, %%mm7 \n\t" // 0, 0, 0, QP, 0, 0, 0, QP
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
3612 "packuswb %%mm7, %%mm7 \n\t" // 0,QP, 0, QP, 0,QP, 0, QP
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
3613 "packuswb %%mm7, %%mm7 \n\t" // QP,..., QP
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3614 "movq %%mm7, %0 \n\t"
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3615 : "=m" (c.pQPb)
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3616 : "r" (QP)
111
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
3617 );
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3618 #endif
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3619
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
3620
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3621 #ifdef HAVE_MMX2
126
55f57883bbf5 more speed
michael
parents: 121
diff changeset
3622 /*
55f57883bbf5 more speed
michael
parents: 121
diff changeset
3623 prefetchnta(srcBlock + (((x>>2)&6) + 5)*srcStride + 32);
55f57883bbf5 more speed
michael
parents: 121
diff changeset
3624 prefetchnta(srcBlock + (((x>>2)&6) + 6)*srcStride + 32);
55f57883bbf5 more speed
michael
parents: 121
diff changeset
3625 prefetcht0(dstBlock + (((x>>2)&6) + 5)*dstStride + 32);
55f57883bbf5 more speed
michael
parents: 121
diff changeset
3626 prefetcht0(dstBlock + (((x>>2)&6) + 6)*dstStride + 32);
55f57883bbf5 more speed
michael
parents: 121
diff changeset
3627 */
55f57883bbf5 more speed
michael
parents: 121
diff changeset
3628
55f57883bbf5 more speed
michael
parents: 121
diff changeset
3629 asm(
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3630 "mov %4, %%"REG_a" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3631 "shr $2, %%"REG_a" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3632 "and $6, %%"REG_a" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3633 "add %5, %%"REG_a" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3634 "mov %%"REG_a", %%"REG_d" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3635 "imul %1, %%"REG_a" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3636 "imul %3, %%"REG_d" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3637 "prefetchnta 32(%%"REG_a", %0) \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3638 "prefetcht0 32(%%"REG_d", %2) \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3639 "add %1, %%"REG_a" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3640 "add %3, %%"REG_d" \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3641 "prefetchnta 32(%%"REG_a", %0) \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3642 "prefetcht0 32(%%"REG_d", %2) \n\t"
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3643 :: "r" (srcBlock), "r" ((long)srcStride), "r" (dstBlock), "r" ((long)dstStride),
2767
49da251f2608 GCC4 fix
gpoirier
parents: 2677
diff changeset
3644 "g" ((long)x), "g" ((long)copyAhead)
2293
15cfba1b97b5 adapting existing mmx/mmx2/sse/3dnow optimizations so they work on x86_64 patch by (Aurelien Jacobs <aurel at gnuage dot org>)
michael
parents: 2276
diff changeset
3645 : "%"REG_a, "%"REG_d
126
55f57883bbf5 more speed
michael
parents: 121
diff changeset
3646 );
55f57883bbf5 more speed
michael
parents: 121
diff changeset
3647
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
3648 #elif defined(HAVE_3DNOW)
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
3649 //FIXME check if this is faster on an 3dnow chip or if its faster without the prefetch or ...
111
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
3650 /* prefetch(srcBlock + (((x>>3)&3) + 5)*srcStride + 32);
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
3651 prefetch(srcBlock + (((x>>3)&3) + 9)*srcStride + 32);
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
3652 prefetchw(dstBlock + (((x>>3)&3) + 5)*dstStride + 32);
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
3653 prefetchw(dstBlock + (((x>>3)&3) + 9)*dstStride + 32);
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
3654 */
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3655 #endif
111
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
3656
169
20bcd5b70886 runtime cpu detection
michael
parents: 168
diff changeset
3657 RENAME(blockCopy)(dstBlock + dstStride*copyAhead, dstStride,
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3658 srcBlock + srcStride*copyAhead, srcStride, mode & LEVEL_FIX, &c.packedYOffset);
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3659
111
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
3660 if(mode & LINEAR_IPOL_DEINT_FILTER)
169
20bcd5b70886 runtime cpu detection
michael
parents: 168
diff changeset
3661 RENAME(deInterlaceInterpolateLinear)(dstBlock, dstStride);
111
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
3662 else if(mode & LINEAR_BLEND_DEINT_FILTER)
1581
d2fc92d02bf7 linear blend 1 line shift fix
michael
parents: 1331
diff changeset
3663 RENAME(deInterlaceBlendLinear)(dstBlock, dstStride, c.deintTemp + x);
111
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
3664 else if(mode & MEDIAN_DEINT_FILTER)
169
20bcd5b70886 runtime cpu detection
michael
parents: 168
diff changeset
3665 RENAME(deInterlaceMedian)(dstBlock, dstStride);
111
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
3666 else if(mode & CUBIC_IPOL_DEINT_FILTER)
169
20bcd5b70886 runtime cpu detection
michael
parents: 168
diff changeset
3667 RENAME(deInterlaceInterpolateCubic)(dstBlock, dstStride);
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3668 else if(mode & FFMPEG_DEINT_FILTER)
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3669 RENAME(deInterlaceFF)(dstBlock, dstStride, c.deintTemp + x);
1157
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
3670 else if(mode & LOWPASS5_DEINT_FILTER)
57fe9c4e0c6e fixing cliping of c deinterlacers
michaelni
parents: 1109
diff changeset
3671 RENAME(deInterlaceL5)(dstBlock, dstStride, c.deintTemp + x, c.deintTemp + width + x);
111
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
3672 /* else if(mode & CUBIC_BLEND_DEINT_FILTER)
169
20bcd5b70886 runtime cpu detection
michael
parents: 168
diff changeset
3673 RENAME(deInterlaceBlendCubic)(dstBlock, dstStride);
106
389391a6d0bf rewrote the horizontal lowpass filter to fix a bug which caused a blocky look
michael
parents: 105
diff changeset
3674 */
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3675
111
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
3676 /* only deblock if we have 2 blocks */
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
3677 if(y + 8 < height)
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
3678 {
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3679 if(mode & V_X1_FILTER)
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3680 RENAME(vertX1Filter)(dstBlock, stride, &c);
115
4514b8e7f0f1 more logic behavior if the altenative deblock filters are used (turning a alt filter on without turning the deblock filter on uses the alt filter instead of using no filter now)
michael
parents: 113
diff changeset
3681 else if(mode & V_DEBLOCK)
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3682 {
1327
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
3683 const int t= RENAME(vertClassify)(dstBlock, stride, &c);
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
3684
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
3685 if(t==1)
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
3686 RENAME(doVertLowPass)(dstBlock, stride, &c);
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
3687 else if(t==2)
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3688 RENAME(doVertDefFilter)(dstBlock, stride, &c);
2037
98d8283534bb accurate/slow (per line instead of per block) deblock filter spport which is identical to what is recommanded in the mpeg4 spec
michael
parents: 2036
diff changeset
3689 }else if(mode & V_A_DEBLOCK){
2039
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3690 RENAME(do_a_deblock)(dstBlock, stride, 1, &c);
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3691 }
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3692 }
130
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
3693
128
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
3694 #ifdef HAVE_MMX
169
20bcd5b70886 runtime cpu detection
michael
parents: 168
diff changeset
3695 RENAME(transpose1)(tempBlock1, tempBlock2, dstBlock, dstStride);
128
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
3696 #endif
111
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
3697 /* check if we have a previous block to deblock it with dstBlock */
112
a2c063b6ecf9 fixed a bug in the tmp buffer
michael
parents: 111
diff changeset
3698 if(x - 8 >= 0)
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3699 {
128
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
3700 #ifdef HAVE_MMX
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3701 if(mode & H_X1_FILTER)
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3702 RENAME(vertX1Filter)(tempBlock1, 16, &c);
128
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
3703 else if(mode & H_DEBLOCK)
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
3704 {
1327
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
3705 //START_TIMER
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
3706 const int t= RENAME(vertClassify)(tempBlock1, 16, &c);
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
3707 //STOP_TIMER("dc & minmax")
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
3708 if(t==1)
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
3709 RENAME(doVertLowPass)(tempBlock1, 16, &c);
854571532c89 blinking blocks around thin vertical lines and dots bugfix
michaelni
parents: 1196
diff changeset
3710 else if(t==2)
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3711 RENAME(doVertDefFilter)(tempBlock1, 16, &c);
2037
98d8283534bb accurate/slow (per line instead of per block) deblock filter spport which is identical to what is recommanded in the mpeg4 spec
michael
parents: 2036
diff changeset
3712 }else if(mode & H_A_DEBLOCK){
2039
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3713 RENAME(do_a_deblock)(tempBlock1, 16, 1, &c);
128
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
3714 }
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
3715
169
20bcd5b70886 runtime cpu detection
michael
parents: 168
diff changeset
3716 RENAME(transpose2)(dstBlock-4, dstStride, tempBlock1 + 4*16);
128
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
3717
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
3718 #else
115
4514b8e7f0f1 more logic behavior if the altenative deblock filters are used (turning a alt filter on without turning the deblock filter on uses the alt filter instead of using no filter now)
michael
parents: 113
diff changeset
3719 if(mode & H_X1_FILTER)
4514b8e7f0f1 more logic behavior if the altenative deblock filters are used (turning a alt filter on without turning the deblock filter on uses the alt filter instead of using no filter now)
michael
parents: 113
diff changeset
3720 horizX1Filter(dstBlock-4, stride, QP);
4514b8e7f0f1 more logic behavior if the altenative deblock filters are used (turning a alt filter on without turning the deblock filter on uses the alt filter instead of using no filter now)
michael
parents: 113
diff changeset
3721 else if(mode & H_DEBLOCK)
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3722 {
2043
703b80c99891 Another (final?) patch for libpostproc.
michael
parents: 2041
diff changeset
3723 #ifdef HAVE_ALTIVEC
703b80c99891 Another (final?) patch for libpostproc.
michael
parents: 2041
diff changeset
3724 unsigned char __attribute__ ((aligned(16))) tempBlock[272];
703b80c99891 Another (final?) patch for libpostproc.
michael
parents: 2041
diff changeset
3725 transpose_16x8_char_toPackedAlign_altivec(tempBlock, dstBlock - (4 + 1), stride);
703b80c99891 Another (final?) patch for libpostproc.
michael
parents: 2041
diff changeset
3726
703b80c99891 Another (final?) patch for libpostproc.
michael
parents: 2041
diff changeset
3727 const int t=vertClassify_altivec(tempBlock-48, 16, &c);
703b80c99891 Another (final?) patch for libpostproc.
michael
parents: 2041
diff changeset
3728 if(t==1) {
703b80c99891 Another (final?) patch for libpostproc.
michael
parents: 2041
diff changeset
3729 doVertLowPass_altivec(tempBlock-48, 16, &c);
703b80c99891 Another (final?) patch for libpostproc.
michael
parents: 2041
diff changeset
3730 transpose_8x16_char_fromPackedAlign_altivec(dstBlock - (4 + 1), tempBlock, stride);
703b80c99891 Another (final?) patch for libpostproc.
michael
parents: 2041
diff changeset
3731 }
703b80c99891 Another (final?) patch for libpostproc.
michael
parents: 2041
diff changeset
3732 else if(t==2) {
703b80c99891 Another (final?) patch for libpostproc.
michael
parents: 2041
diff changeset
3733 doVertDefFilter_altivec(tempBlock-48, 16, &c);
703b80c99891 Another (final?) patch for libpostproc.
michael
parents: 2041
diff changeset
3734 transpose_8x16_char_fromPackedAlign_altivec(dstBlock - (4 + 1), tempBlock, stride);
703b80c99891 Another (final?) patch for libpostproc.
michael
parents: 2041
diff changeset
3735 }
703b80c99891 Another (final?) patch for libpostproc.
michael
parents: 2041
diff changeset
3736 #else
2036
6a6c678517b3 altivec optimizations and horizontal filter fix by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 2031
diff changeset
3737 const int t= RENAME(horizClassify)(dstBlock-4, stride, &c);
6a6c678517b3 altivec optimizations and horizontal filter fix by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 2031
diff changeset
3738
6a6c678517b3 altivec optimizations and horizontal filter fix by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 2031
diff changeset
3739 if(t==1)
6a6c678517b3 altivec optimizations and horizontal filter fix by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 2031
diff changeset
3740 RENAME(doHorizLowPass)(dstBlock-4, stride, &c);
6a6c678517b3 altivec optimizations and horizontal filter fix by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 2031
diff changeset
3741 else if(t==2)
6a6c678517b3 altivec optimizations and horizontal filter fix by (Romain Dolbeau <dolbeau at irisa dot fr>)
michael
parents: 2031
diff changeset
3742 RENAME(doHorizDefFilter)(dstBlock-4, stride, &c);
2043
703b80c99891 Another (final?) patch for libpostproc.
michael
parents: 2041
diff changeset
3743 #endif
2037
98d8283534bb accurate/slow (per line instead of per block) deblock filter spport which is identical to what is recommanded in the mpeg4 spec
michael
parents: 2036
diff changeset
3744 }else if(mode & H_A_DEBLOCK){
2039
f25e485a7850 mmx optimized version of the per line/accurate deblock filter
michael
parents: 2038
diff changeset
3745 RENAME(do_a_deblock)(dstBlock-8, 1, stride, &c);
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3746 }
128
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
3747 #endif
130
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
3748 if(mode & DERING)
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
3749 {
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
3750 //FIXME filter first line
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3751 if(y>0) RENAME(dering)(dstBlock - stride - 8, stride, &c);
130
0cce5d30d1d8 dering in mmx2
michael
parents: 129
diff changeset
3752 }
156
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
3753
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
3754 if(mode & TEMP_NOISE_FILTER)
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
3755 {
169
20bcd5b70886 runtime cpu detection
michael
parents: 168
diff changeset
3756 RENAME(tempNoiseReducer)(dstBlock-8, stride,
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3757 c.tempBlured[isColor] + y*dstStride + x,
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3758 c.tempBluredPast[isColor] + (y>>3)*256 + (x>>3),
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3759 c.ppMode.maxTmpNoise);
156
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
3760 }
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3761 }
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3762
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3763 dstBlock+=8;
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3764 srcBlock+=8;
128
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
3765
129
be35346e27c1 fixed difference with -vo md5 between doVertDefFilter() C and MMX / MMX2 versions
michael
parents: 128
diff changeset
3766 #ifdef HAVE_MMX
128
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
3767 tmpXchg= tempBlock1;
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
3768 tempBlock1= tempBlock2;
e5266b8e79be much better horizontal filters (transpose & use the vertical ones) :)
michael
parents: 126
diff changeset
3769 tempBlock2 = tmpXchg;
129
be35346e27c1 fixed difference with -vo md5 between doVertDefFilter() C and MMX / MMX2 versions
michael
parents: 128
diff changeset
3770 #endif
111
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
3771 }
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
3772
156
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
3773 if(mode & DERING)
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
3774 {
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3775 if(y > 0) RENAME(dering)(dstBlock - dstStride - 8, dstStride, &c);
156
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
3776 }
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
3777
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
3778 if((mode & TEMP_NOISE_FILTER))
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
3779 {
169
20bcd5b70886 runtime cpu detection
michael
parents: 168
diff changeset
3780 RENAME(tempNoiseReducer)(dstBlock-8, dstStride,
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3781 c.tempBlured[isColor] + y*dstStride + x,
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3782 c.tempBluredPast[isColor] + (y>>3)*256 + (x>>3),
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3783 c.ppMode.maxTmpNoise);
156
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
3784 }
c09459686be3 temporal noise reducer in C (-pp 0x100000)
michael
parents: 152
diff changeset
3785
142
da4c751fc151 deinterlace bugfix
michael
parents: 141
diff changeset
3786 /* did we use a tmp buffer for the last lines*/
112
a2c063b6ecf9 fixed a bug in the tmp buffer
michael
parents: 111
diff changeset
3787 if(y+15 >= height)
111
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
3788 {
8e4c5a16c9fc fixed the height%8!=0 bug
michael
parents: 109
diff changeset
3789 uint8_t *dstBlock= &(dst[y*dstStride]);
2527
ace6e273f318 support for negative strides
henry
parents: 2295
diff changeset
3790 if(width==ABS(dstStride))
ace6e273f318 support for negative strides
henry
parents: 2295
diff changeset
3791 linecpy(dstBlock, tempDst + dstStride, height-y, dstStride);
941
1e4ab5fdfca1 cleaning corners of green dirt ;)
michael
parents: 886
diff changeset
3792 else
1e4ab5fdfca1 cleaning corners of green dirt ;)
michael
parents: 886
diff changeset
3793 {
944
927c246f1f6d 10l another int i missing (without ^M)
faust3
parents: 943
diff changeset
3794 int i;
941
1e4ab5fdfca1 cleaning corners of green dirt ;)
michael
parents: 886
diff changeset
3795 for(i=0; i<height-y; i++)
1e4ab5fdfca1 cleaning corners of green dirt ;)
michael
parents: 886
diff changeset
3796 {
1e4ab5fdfca1 cleaning corners of green dirt ;)
michael
parents: 886
diff changeset
3797 memcpy(dstBlock + i*dstStride, tempDst + (i+1)*dstStride, width);
1e4ab5fdfca1 cleaning corners of green dirt ;)
michael
parents: 886
diff changeset
3798 }
1e4ab5fdfca1 cleaning corners of green dirt ;)
michael
parents: 886
diff changeset
3799 }
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3800 }
163
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
3801 /*
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
3802 for(x=0; x<width; x+=32)
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
3803 {
164
dedb3aef2bee cleanup
michael
parents: 163
diff changeset
3804 volatile int i;
163
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
3805 i+= + dstBlock[x + 7*dstStride] + dstBlock[x + 8*dstStride]
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
3806 + dstBlock[x + 9*dstStride] + dstBlock[x +10*dstStride]
164
dedb3aef2bee cleanup
michael
parents: 163
diff changeset
3807 + dstBlock[x +11*dstStride] + dstBlock[x +12*dstStride];
dedb3aef2bee cleanup
michael
parents: 163
diff changeset
3808 // + dstBlock[x +13*dstStride]
dedb3aef2bee cleanup
michael
parents: 163
diff changeset
3809 // + dstBlock[x +14*dstStride] + dstBlock[x +15*dstStride];
dedb3aef2bee cleanup
michael
parents: 163
diff changeset
3810 }*/
dedb3aef2bee cleanup
michael
parents: 163
diff changeset
3811 }
96
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
3812 #ifdef HAVE_3DNOW
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
3813 asm volatile("femms");
29ac11dc53d3 fixed a bug in the horizontal default filter
arpi
parents: 95
diff changeset
3814 #elif defined (HAVE_MMX)
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3815 asm volatile("emms");
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3816 #endif
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3817
163
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
3818 #ifdef DEBUG_BRIGHTNESS
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
3819 if(!isColor)
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
3820 {
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
3821 int max=1;
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
3822 int i;
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
3823 for(i=0; i<256; i++)
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
3824 if(yHistogram[i] > max) max=yHistogram[i];
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
3825
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
3826 for(i=1; i<256; i++)
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
3827 {
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
3828 int x;
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
3829 int start=yHistogram[i-1]/(max/256+1);
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
3830 int end=yHistogram[i]/(max/256+1);
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
3831 int inc= end > start ? 1 : -1;
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
3832 for(x=start; x!=end+inc; x+=inc)
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
3833 dst[ i*dstStride + x]+=128;
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
3834 }
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
3835
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
3836 for(i=0; i<100; i+=2)
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
3837 {
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
3838 dst[ (white)*dstStride + i]+=128;
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
3839 dst[ (black)*dstStride + i]+=128;
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
3840 }
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
3841
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
3842 }
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
3843 #endif
32e7f17a04a7 faster mmx2 / 3dnow deblocking filter
michael
parents: 158
diff changeset
3844
787
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3845 *c2= c; //copy local context back
4914252c963a postprocessing cleanup:
michael
parents: 634
diff changeset
3846
95
8bce253b537c new postprocess code by Michael Niedermayer (michaelni@gmx.at)
arpi
parents:
diff changeset
3847 }