gnutls-commit
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[SCM] GNU gnutls branch, master, updated. gnutls_2_99_2-27-g5f84e48


From: Nikos Mavrogiannopoulos
Subject: [SCM] GNU gnutls branch, master, updated. gnutls_2_99_2-27-g5f84e48
Date: Sun, 29 May 2011 22:17:46 +0000

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU gnutls".

http://git.savannah.gnu.org/cgit/gnutls.git/commit/?id=5f84e48a3f8ae92181f6031bf211989f6c54add2

The branch, master has been updated
       via  5f84e48a3f8ae92181f6031bf211989f6c54add2 (commit)
       via  a6219a5918cff2431cc83cb06a2929a7853a2bed (commit)
       via  b983b8638fba58ad76f98423a23566442af72dc9 (commit)
       via  8dc2a74cbdad286b6a97d55b2a47929f07e44aa7 (commit)
       via  23df2cf3d4e719b51d6be784b0249b68139d1668 (commit)
       via  b50b4b052bb9cd455615c2ed784bc419cae6719c (commit)
      from  14f27b4e2488f82eeaf05b78073daedb0712a76f (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
commit 5f84e48a3f8ae92181f6031bf211989f6c54add2
Author: Nikos Mavrogiannopoulos <address@hidden>
Date:   Sun May 29 23:34:55 2011 +0200

    Added new AES code by Andy.

commit a6219a5918cff2431cc83cb06a2929a7853a2bed
Author: Nikos Mavrogiannopoulos <address@hidden>
Date:   Sun May 29 12:39:46 2011 +0200

    Added missing file.

commit b983b8638fba58ad76f98423a23566442af72dc9
Author: Nikos Mavrogiannopoulos <address@hidden>
Date:   Sun May 29 12:39:44 2011 +0200

    more files to ignore

commit 8dc2a74cbdad286b6a97d55b2a47929f07e44aa7
Author: Nikos Mavrogiannopoulos <address@hidden>
Date:   Sun May 29 12:35:57 2011 +0200

    Added FSF copyright to public domain files.

commit 23df2cf3d4e719b51d6be784b0249b68139d1668
Author: Nikos Mavrogiannopoulos <address@hidden>
Date:   Sun May 29 12:01:16 2011 +0200

    Use cpuid.h if it exists, to use the x86 CPUID instruction.

commit b50b4b052bb9cd455615c2ed784bc419cae6719c
Author: Nikos Mavrogiannopoulos <address@hidden>
Date:   Sun May 29 01:40:16 2011 +0200

    Added Dash.

-----------------------------------------------------------------------

Summary of changes:
 .gitignore                                   |    5 +
 THANKS                                       |    1 +
 configure.ac                                 |    1 +
 doc/credentials/x509/ca-key.pem              |  145 ++
 lib/accelerated/intel/asm/appro-aes-x86-64.s | 2416 ++++++++++++++++++++++----
 lib/accelerated/intel/asm/appro-aes-x86.s    | 2359 ++++++++++++++++++++------
 lib/accelerated/x86.h                        |    9 +
 lib/nettle/Makefile.am                       |    3 +-
 lib/nettle/ecc_free.c                        |   30 +-
 lib/nettle/ecc_make_key.c                    |   30 +-
 lib/nettle/ecc_map.c                         |   30 +-
 lib/nettle/ecc_mulmod.c                      |   30 +-
 lib/nettle/ecc_points.c                      |   30 +-
 lib/nettle/ecc_projective_add_point.c        |   30 +-
 lib/nettle/ecc_projective_dbl_point_3.c      |   30 +-
 lib/nettle/ecc_shared_secret.c               |   30 +-
 lib/nettle/ecc_sign_hash.c                   |   30 +-
 lib/nettle/ecc_test.c                        |  142 --
 lib/nettle/ecc_verify_hash.c                 |   30 +-
 19 files changed, 4331 insertions(+), 1050 deletions(-)
 create mode 100644 doc/credentials/x509/ca-key.pem
 delete mode 100644 lib/nettle/ecc_test.c

diff --git a/.gitignore b/.gitignore
index e00238b..68a55b9 100644
--- a/.gitignore
+++ b/.gitignore
@@ -455,3 +455,8 @@ tests/suite/x509paths/X509tests
 tests/x509cert
 src/benchmark-cipher
 src/benchmark-tls
+doc/gnutls-guile.html
+doc/version-guile.texi
+build-aux/compile
+doc/stamp-1
+lib/algorithms/libgnutls_alg.la
diff --git a/THANKS b/THANKS
index ef6cb28..c01a0ed 100644
--- a/THANKS
+++ b/THANKS
@@ -113,6 +113,7 @@ Michael Rommel                      <rommel [at] 
layer-7.net>
 Mark Brand                     <mabrand [at] mabrand.nl>
 Vitaly Kruglikov               <vitaly.kruglikov [at] palm.com>
 Kalle Olavi Niemitalo          <kon [at] iki.fi>
+Dash Shendy                     <admin [at] dash.za.net>
 
 ----------------------------------------------------------------------
 Copying and distribution of this file, with or without modification,
diff --git a/configure.ac b/configure.ac
index 95eb972..00f4a7e 100644
--- a/configure.ac
+++ b/configure.ac
@@ -81,6 +81,7 @@ case $host_cpu in
   i?86 | x86_64 | amd64)
 dnl    GCC_FLAG_ADD([-maes -mpclmul],[X86])
 dnl    if test "x$X86" = "xyes";then
+      AC_CHECK_HEADERS(cpuid.h)
       if test "$host_cpu" = "x86_64" -o "$host_cpu" = "amd64";then
         hw_accel="x86-64"
       else
diff --git a/doc/credentials/x509/ca-key.pem b/doc/credentials/x509/ca-key.pem
new file mode 100644
index 0000000..4efbe5a
--- /dev/null
+++ b/doc/credentials/x509/ca-key.pem
@@ -0,0 +1,145 @@
+Public Key Info:
+       Public Key Algorithm: RSA
+       Key Security Level: Normal
+
+modulus:
+       00:9c:e4:42:b1:7d:6e:9e:5f:ff:7f:2d:9d:d7:4e:
+       78:5d:db:88:83:fd:c2:a9:50:5a:4f:71:dc:6b:ae:
+       52:12:80:f0:87:42:a2:3e:d4:28:3a:06:4b:74:a6:
+       36:72:86:c6:b3:fa:23:62:d3:a3:72:cd:0a:9e:53:
+       d8:76:6b:63:12:1e:96:12:1b:89:53:de:6f:e1:34:
+       1d:0b:83:8b:32:21:39:e9:e2:06:ab:6e:76:85:90:
+       1b:1e:84:cb:f3:84:35:e0:3c:50:58:6b:b3:40:af:
+       37:d2:29:a5:ed:f6:f0:d9:67:08:71:14:3c:bc:51:
+       ac:f1:2c:df:5f:0e:b7:f8:c2:3a:16:ae:a2:30:04:
+       08:a8:fd:3c:5b:31:a6:45:1c:cb:e7:0b:c2:88:f8:
+       42:56:4a:cf:9b:06:d7:a0:00:6e:6f:a0:00:b1:8c:
+       16:3c:90:7d:d4:cf:7f:97:1e:60:14:7e:64:f7:f8:
+       8f:7e:2d:ec:d8:a8:37:17:c3:0e:72:9a:6a:15:88:
+       f1:0d:29:ec:7e:2c:fa:78:c8:75:f9:b6:15:20:0a:
+       37:eb:bb:c6:55:81:e2:81:73:04:64:2d:85:7b:39:
+       70:20:76:99:ce:91:28:16:56:37:6b:b2:c5:27:4d:
+       32:ae:34:3d:d7:4a:fc:50:4f:82:10:c4:d8:cc:4e:
+       34:0f:4a:25:08:ca:3b:14:0f:51:0a:37:8e:dd:b5:
+       08:a1:86:88:75:54:d4:19:61:06:1d:64:9e:a3:11:
+       9e:8b:d1:a4:9b:ab:be:01:28:fc:7f:e8:b4:8f:17:
+       43:da:a5:ec:7b:
+public exponent:
+       01:00:01:
+private exponent:
+       6a:cd:04:0d:99:0a:65:6b:8a:1c:c4:2b:cf:b6:8e:
+       3f:ae:43:47:3e:c6:75:c5:ca:44:8c:88:f5:10:8c:
+       b4:25:ec:16:d7:a8:64:c6:bd:bf:8a:2b:71:73:f8:
+       5a:8c:1e:d5:c3:b0:b5:04:c7:1e:4e:30:2d:49:7c:
+       70:58:77:ef:8c:bc:b2:04:e6:be:1e:0c:e1:2c:3d:
+       9d:69:e5:a6:b1:71:a0:22:0a:52:46:f7:0d:c2:e4:
+       83:28:f9:41:83:3d:bd:b0:b1:2d:0f:db:cd:6b:b9:
+       bf:2a:34:d7:42:24:00:8a:9f:f7:82:44:3a:1a:0b:
+       75:7e:0b:6c:c5:33:3d:76:d2:5e:40:71:0d:e8:a1:
+       10:90:9a:b6:a5:9c:bf:2d:74:2c:8b:17:d9:6f:ce:
+       90:b8:79:79:dd:14:4a:bc:87:96:24:81:5a:14:6b:
+       cf:16:b2:94:5e:b7:7b:cc:cc:4a:a9:8e:e3:a9:c3:
+       70:51:1f:03:f6:f0:92:1f:1e:39:9a:58:05:e0:9c:
+       0c:4e:06:4a:6a:31:23:e6:21:bf:0a:ec:8f:31:a0:
+       c9:24:e2:cd:ff:fa:25:fa:1c:bf:4f:22:c6:e5:0f:
+       52:8d:95:ab:1f:58:30:20:f1:2b:ea:df:c4:af:b5:
+       7e:10:c5:4f:16:72:3f:f5:2e:88:3c:51:23:37:20:
+       7c:55:d4:bb:d7:23:6a:b0:14:81:a4:c1:6b:06:3b:
+       28:17:e9:80:dd:1a:e5:d6:bb:0d:30:cb:6a:34:9b:
+       23:ae:49:49:42:24:b8:7f:72:f6:e9:4a:c9:75:2b:
+       7f:ac:40:b1:
+prime1:
+       00:d0:9c:a7:0f:3a:c4:ec:84:3d:92:22:39:ef:3e:
+       81:27:8a:5e:bf:01:7d:69:78:e8:ec:af:62:cf:c0:
+       ec:1d:f0:38:f4:f9:e5:ab:bc:aa:a2:5c:78:fa:23:
+       0d:03:9c:7b:29:3c:6f:26:91:c9:a4:31:41:72:63:
+       76:65:02:0d:f1:56:0f:b0:70:ef:be:6e:97:bb:f6:
+       ed:57:b6:02:16:eb:83:f6:c9:f6:ce:51:d2:91:b6:
+       a1:85:83:b9:da:da:29:b1:eb:23:6a:dd:3d:cc:1f:
+       40:e2:f2:68:db:be:7f:2a:4f:2b:5b:ed:ad:ff:c8:
+       ef:16:9c:15:68:71:24:8c:44:bb:58:17:0d:f2:fa:
+       b7:ca:e6:f1:b3:5e:45:fc:3a:56:82:44:95:d5:15:
+       90:c9:d3:
+prime2:
+       00:c0:87:ef:09:79:4e:4a:ea:23:86:c7:10:3e:59:
+       90:8e:f0:32:ff:8a:9d:8f:5c:dc:2c:5a:99:6a:46:
+       04:dd:c2:0d:41:f0:3c:71:78:95:fc:10:da:90:9d:
+       1a:f8:f5:27:eb:26:2b:44:c2:b1:64:27:2c:3f:f4:
+       03:98:e9:b7:34:70:69:69:7b:bc:c9:85:b8:8b:e3:
+       45:a0:44:90:b9:3f:bf:76:b8:a1:29:a6:05:63:cb:
+       03:a2:8a:06:31:ce:b4:15:89:7f:ee:e5:ce:89:da:
+       8c:e6:0f:38:43:1e:cc:dc:58:f3:73:19:1d:82:9c:
+       0e:fa:f2:a8:ad:ab:91:09:06:fc:a6:10:cd:82:be:
+       4a:fb:3c:b2:92:0b:24:cf:6d:02:2e:0d:4a:52:aa:
+       34:c1:b9:
+coefficient:
+       00:86:2e:30:76:ad:fd:d3:00:ab:06:e6:bf:aa:db:
+       1f:49:8a:23:7c:b4:be:b3:fa:ff:5a:7a:d7:09:2c:
+       ad:ed:d2:0c:7d:a8:bc:e3:a4:a3:8d:10:0e:47:a3:
+       ad:5d:66:3b:58:35:55:95:53:3d:1f:5e:0a:db:10:
+       32:b6:0a:8f:e0:0c:4b:8c:e6:94:ef:5e:ba:cb:b3:
+       d0:b2:88:a3:d6:ff:16:0e:60:59:fe:0b:43:03:6f:
+       ea:57:54:9b:cd:1c:2a:e6:57:3f:f2:d4:81:dd:07:
+       f3:dc:39:53:1c:09:f9:bf:0f:f6:5c:8e:2f:e0:aa:
+       f7:b8:58:4b:21:3f:5d:2f:08:24:e4:3a:3b:52:6f:
+       28:3c:ee:29:f5:03:be:8b:93:9a:f1:ac:ce:12:ac:
+       fe:7f:32:
+exp1:
+       00:a7:07:16:77:8a:2d:8b:d5:e1:da:74:8f:00:70:
+       82:46:9f:72:76:ea:81:78:86:77:b0:b2:48:a2:61:
+       2c:6c:58:1f:b2:7d:b7:97:86:ca:f4:8e:a7:ca:57:
+       70:1f:19:16:3f:91:04:c9:d3:e6:a8:11:4b:fe:83:
+       86:93:1f:4e:fc:91:54:a4:87:f8:5c:f7:fd:83:61:
+       14:ed:aa:6c:07:df:f0:5c:13:9f:09:d8:d7:89:15:
+       ba:43:c5:91:74:9a:42:d2:12:9b:db:ff:62:70:62:
+       01:b8:f4:30:62:e9:26:b6:40:87:4d:e6:82:ef:8e:
+       f9:67:97:f7:48:15:77:16:dc:1d:48:4d:c5:3c:6b:
+       e3:e6:90:7c:ab:89:ea:ed:25:e4:88:0e:d4:0c:b5:
+       64:a5:43:
+exp2:
+       7a:14:b7:c9:b6:15:a3:03:1c:4b:d5:e5:c2:e3:5f:
+       fa:82:ec:93:84:fd:ab:6e:22:5e:2d:84:a2:12:8b:
+       fb:61:94:ae:7e:fa:94:a8:f5:d1:c3:8e:13:ac:ca:
+       f1:99:e2:1a:05:35:e2:7f:e1:a3:b4:03:26:fa:3f:
+       5d:b2:b4:ec:97:6a:ff:eb:ea:25:8e:99:1a:7a:9e:
+       27:a5:d2:6e:e4:b1:2f:42:9b:4e:a1:6b:41:7f:f5:
+       6a:17:43:1e:4a:07:7e:b0:95:62:92:6d:88:94:00:
+       4b:d0:d2:c8:1c:bb:a1:ec:f5:51:c2:57:27:fe:74:
+       b1:43:35:1a:0a:74:08:d9:59:52:a3:cc:ec:5e:65:
+       85:31:53:b9:af:3f:44:17:c7:0e:14:77:50:3b:85:
+       00:61:
+
+Public Key ID: 4D:56:B7:6A:00:58:F1:67:92:F4:A6:75:55:1B:8E:53:01:03:EF:CF
+
+-----BEGIN RSA PRIVATE KEY-----
+MIIFfAIBAAKCATEAnORCsX1unl//fy2d1054XduIg/3CqVBaT3Hca65SEoDwh0Ki
+PtQoOgZLdKY2cobGs/ojYtOjcs0KnlPYdmtjEh6WEhuJU95v4TQdC4OLMiE56eIG
+q252hZAbHoTL84Q14DxQWGuzQK830iml7fbw2WcIcRQ8vFGs8SzfXw63+MI6Fq6i
+MAQIqP08WzGmRRzL5wvCiPhCVkrPmwbXoABub6AAsYwWPJB91M9/lx5gFH5k9/iP
+fi3s2Kg3F8MOcppqFYjxDSnsfiz6eMh1+bYVIAo367vGVYHigXMEZC2FezlwIHaZ
+zpEoFlY3a7LFJ00yrjQ910r8UE+CEMTYzE40D0olCMo7FA9RCjeO3bUIoYaIdVTU
+GWEGHWSeoxGei9Gkm6u+ASj8f+i0jxdD2qXsewIDAQABAoIBMGrNBA2ZCmVrihzE
+K8+2jj+uQ0c+xnXFykSMiPUQjLQl7BbXqGTGvb+KK3Fz+FqMHtXDsLUExx5OMC1J
+fHBYd++MvLIE5r4eDOEsPZ1p5aaxcaAiClJG9w3C5IMo+UGDPb2wsS0P281rub8q
+NNdCJACKn/eCRDoaC3V+C2zFMz120l5AcQ3ooRCQmralnL8tdCyLF9lvzpC4eXnd
+FEq8h5YkgVoUa88WspRet3vMzEqpjuOpw3BRHwP28JIfHjmaWAXgnAxOBkpqMSPm
+Ib8K7I8xoMkk4s3/+iX6HL9PIsblD1KNlasfWDAg8Svq38SvtX4QxU8Wcj/1Log8
+USM3IHxV1LvXI2qwFIGkwWsGOygX6YDdGuXWuw0wy2o0myOuSUlCJLh/cvbpSsl1
+K3+sQLECgZkA0JynDzrE7IQ9kiI57z6BJ4pevwF9aXjo7K9iz8DsHfA49Pnlq7yq
+olx4+iMNA5x7KTxvJpHJpDFBcmN2ZQIN8VYPsHDvvm6Xu/btV7YCFuuD9sn2zlHS
+kbahhYO52topsesjat09zB9A4vJo275/Kk8rW+2t/8jvFpwVaHEkjES7WBcN8vq3
+yubxs15F/DpWgkSV1RWQydMCgZkAwIfvCXlOSuojhscQPlmQjvAy/4qdj1zcLFqZ
+akYE3cINQfA8cXiV/BDakJ0a+PUn6yYrRMKxZCcsP/QDmOm3NHBpaXu8yYW4i+NF
+oESQuT+/drihKaYFY8sDoooGMc60FYl/7uXOidqM5g84Qx7M3FjzcxkdgpwO+vKo
+rauRCQb8phDNgr5K+zyykgskz20CLg1KUqo0wbkCgZkApwcWd4oti9Xh2nSPAHCC
+Rp9yduqBeIZ3sLJIomEsbFgfsn23l4bK9I6nyldwHxkWP5EEydPmqBFL/oOGkx9O
+/JFUpIf4XPf9g2EU7apsB9/wXBOfCdjXiRW6Q8WRdJpC0hKb2/9icGIBuPQwYukm
+tkCHTeaC7475Z5f3SBV3FtwdSE3FPGvj5pB8q4nq7SXkiA7UDLVkpUMCgZh6FLfJ
+thWjAxxL1eXC41/6guyThP2rbiJeLYSiEov7YZSufvqUqPXRw44TrMrxmeIaBTXi
+f+GjtAMm+j9dsrTsl2r/6+oljpkaep4npdJu5LEvQptOoWtBf/VqF0MeSgd+sJVi
+km2IlABL0NLIHLuh7PVRwlcn/nSxQzUaCnQI2VlSo8zsXmWFMVO5rz9EF8cOFHdQ
+O4UAYQKBmQCGLjB2rf3TAKsG5r+q2x9JiiN8tL6z+v9aetcJLK3t0gx9qLzjpKON
+EA5Ho61dZjtYNVWVUz0fXgrbEDK2Co/gDEuM5pTvXrrLs9CyiKPW/xYOYFn+C0MD
+b+pXVJvNHCrmVz/y1IHdB/PcOVMcCfm/D/Zcji/gqve4WEshP10vCCTkOjtSbyg8
+7in1A76Lk5rxrM4SrP5/Mg==
+-----END RSA PRIVATE KEY-----
diff --git a/lib/accelerated/intel/asm/appro-aes-x86-64.s 
b/lib/accelerated/intel/asm/appro-aes-x86-64.s
index 96b7b6e..e6db040 100644
--- a/lib/accelerated/intel/asm/appro-aes-x86-64.s
+++ b/lib/accelerated/intel/asm/appro-aes-x86-64.s
@@ -5,18 +5,19 @@
 # modification, are permitted provided that the following conditions
 # are met:
 # 
-#     *        Redistributions of source code must retain copyright notices,
-#      this list of conditions and the following disclaimer.
+#     *        Redistributions of source code must retain copyright
+#     * notices,
+#      this list of conditions and the following disclaimer.
 #
-#     *        Redistributions in binary form must reproduce the above
-#      copyright notice, this list of conditions and the following
-#      disclaimer in the documentation and/or other materials
-#      provided with the distribution.
+#     *        Redistributions in binary form must reproduce the above
+#      copyright notice, this list of conditions and the following
+#      disclaimer in the documentation and/or other materials
+#      provided with the distribution.
 #
-#     *        Neither the name of the Andy Polyakov nor the names of its
-#      copyright holder and contributors may be used to endorse or
-#      promote products derived from this software without specific
-#      prior written permission.
+#     *        Neither the name of the Andy Polyakov nor the names of its
+#      copyright holder and contributors may be used to endorse or
+#      promote products derived from this software without specific
+#      prior written permission.
 #
 # ALTERNATIVELY, provided that this notice is retained in full, this
 # product may be distributed under the terms of the GNU General Public
@@ -40,20 +41,20 @@
 .type  aesni_encrypt,@function
 .align 16
 aesni_encrypt:
-       movups  (%rdi),%xmm0
+       movups  (%rdi),%xmm2
        movl    240(%rdx),%eax
-       movaps  (%rdx),%xmm4
-       movaps  16(%rdx),%xmm5
+       movaps  (%rdx),%xmm0
+       movaps  16(%rdx),%xmm1
        leaq    32(%rdx),%rdx
-       pxor    %xmm4,%xmm0
+       xorps   %xmm0,%xmm2
 .Loop_enc1_1:
-.byte  102,15,56,220,197
+.byte  102,15,56,220,209
        decl    %eax
-       movaps  (%rdx),%xmm5
+       movaps  (%rdx),%xmm1
        leaq    16(%rdx),%rdx
        jnz     .Loop_enc1_1    
-.byte  102,15,56,221,197
-       movups  %xmm0,(%rsi)
+.byte  102,15,56,221,209
+       movups  %xmm2,(%rsi)
        .byte   0xf3,0xc3
 .size  aesni_encrypt,.-aesni_encrypt
 
@@ -61,318 +62,1941 @@ aesni_encrypt:
 .type  aesni_decrypt,@function
 .align 16
 aesni_decrypt:
-       movups  (%rdi),%xmm0
+       movups  (%rdi),%xmm2
        movl    240(%rdx),%eax
-       movaps  (%rdx),%xmm4
-       movaps  16(%rdx),%xmm5
+       movaps  (%rdx),%xmm0
+       movaps  16(%rdx),%xmm1
        leaq    32(%rdx),%rdx
-       pxor    %xmm4,%xmm0
+       xorps   %xmm0,%xmm2
 .Loop_dec1_2:
-.byte  102,15,56,222,197
+.byte  102,15,56,222,209
        decl    %eax
-       movaps  (%rdx),%xmm5
+       movaps  (%rdx),%xmm1
        leaq    16(%rdx),%rdx
        jnz     .Loop_dec1_2    
-.byte  102,15,56,223,197
-       movups  %xmm0,(%rsi)
+.byte  102,15,56,223,209
+       movups  %xmm2,(%rsi)
        .byte   0xf3,0xc3
 .size  aesni_decrypt, .-aesni_decrypt
 .type  _aesni_encrypt3,@function
 .align 16
 _aesni_encrypt3:
-       movaps  (%rcx),%xmm4
+       movaps  (%rcx),%xmm0
        shrl    $1,%eax
-       movaps  16(%rcx),%xmm5
+       movaps  16(%rcx),%xmm1
        leaq    32(%rcx),%rcx
-       pxor    %xmm4,%xmm0
-       pxor    %xmm4,%xmm1
-       pxor    %xmm4,%xmm2
+       xorps   %xmm0,%xmm2
+       xorps   %xmm0,%xmm3
+       xorps   %xmm0,%xmm4
+       movaps  (%rcx),%xmm0
 
 .Lenc_loop3:
-.byte  102,15,56,220,197
-       movaps  (%rcx),%xmm4
-.byte  102,15,56,220,205
+.byte  102,15,56,220,209
+.byte  102,15,56,220,217
        decl    %eax
-.byte  102,15,56,220,213
-.byte  102,15,56,220,196
-       movaps  16(%rcx),%xmm5
-.byte  102,15,56,220,204
+.byte  102,15,56,220,225
+       movaps  16(%rcx),%xmm1
+.byte  102,15,56,220,208
+.byte  102,15,56,220,216
        leaq    32(%rcx),%rcx
-.byte  102,15,56,220,212
+.byte  102,15,56,220,224
+       movaps  (%rcx),%xmm0
        jnz     .Lenc_loop3
 
-.byte  102,15,56,220,197
-       movaps  (%rcx),%xmm4
-.byte  102,15,56,220,205
-.byte  102,15,56,220,213
-.byte  102,15,56,221,196
-.byte  102,15,56,221,204
-.byte  102,15,56,221,212
+.byte  102,15,56,220,209
+.byte  102,15,56,220,217
+.byte  102,15,56,220,225
+.byte  102,15,56,221,208
+.byte  102,15,56,221,216
+.byte  102,15,56,221,224
        .byte   0xf3,0xc3
 .size  _aesni_encrypt3,.-_aesni_encrypt3
 .type  _aesni_decrypt3,@function
 .align 16
 _aesni_decrypt3:
-       movaps  (%rcx),%xmm4
+       movaps  (%rcx),%xmm0
        shrl    $1,%eax
-       movaps  16(%rcx),%xmm5
+       movaps  16(%rcx),%xmm1
        leaq    32(%rcx),%rcx
-       pxor    %xmm4,%xmm0
-       pxor    %xmm4,%xmm1
-       pxor    %xmm4,%xmm2
+       xorps   %xmm0,%xmm2
+       xorps   %xmm0,%xmm3
+       xorps   %xmm0,%xmm4
+       movaps  (%rcx),%xmm0
 
 .Ldec_loop3:
-.byte  102,15,56,222,197
-       movaps  (%rcx),%xmm4
-.byte  102,15,56,222,205
+.byte  102,15,56,222,209
+.byte  102,15,56,222,217
        decl    %eax
-.byte  102,15,56,222,213
-.byte  102,15,56,222,196
-       movaps  16(%rcx),%xmm5
-.byte  102,15,56,222,204
+.byte  102,15,56,222,225
+       movaps  16(%rcx),%xmm1
+.byte  102,15,56,222,208
+.byte  102,15,56,222,216
        leaq    32(%rcx),%rcx
-.byte  102,15,56,222,212
+.byte  102,15,56,222,224
+       movaps  (%rcx),%xmm0
        jnz     .Ldec_loop3
 
-.byte  102,15,56,222,197
-       movaps  (%rcx),%xmm4
-.byte  102,15,56,222,205
-.byte  102,15,56,222,213
-.byte  102,15,56,223,196
-.byte  102,15,56,223,204
-.byte  102,15,56,223,212
+.byte  102,15,56,222,209
+.byte  102,15,56,222,217
+.byte  102,15,56,222,225
+.byte  102,15,56,223,208
+.byte  102,15,56,223,216
+.byte  102,15,56,223,224
        .byte   0xf3,0xc3
 .size  _aesni_decrypt3,.-_aesni_decrypt3
 .type  _aesni_encrypt4,@function
 .align 16
 _aesni_encrypt4:
-       movaps  (%rcx),%xmm4
+       movaps  (%rcx),%xmm0
        shrl    $1,%eax
-       movaps  16(%rcx),%xmm5
+       movaps  16(%rcx),%xmm1
        leaq    32(%rcx),%rcx
-       pxor    %xmm4,%xmm0
-       pxor    %xmm4,%xmm1
-       pxor    %xmm4,%xmm2
-       pxor    %xmm4,%xmm3
+       xorps   %xmm0,%xmm2
+       xorps   %xmm0,%xmm3
+       xorps   %xmm0,%xmm4
+       xorps   %xmm0,%xmm5
+       movaps  (%rcx),%xmm0
 
 .Lenc_loop4:
-.byte  102,15,56,220,197
-       movaps  (%rcx),%xmm4
-.byte  102,15,56,220,205
-       decl    %eax
-.byte  102,15,56,220,213
-.byte  102,15,56,220,221
-.byte  102,15,56,220,196
-       movaps  16(%rcx),%xmm5
-.byte  102,15,56,220,204
-       leaq    32(%rcx),%rcx
-.byte  102,15,56,220,212
-.byte  102,15,56,220,220
+.byte  102,15,56,220,209
+.byte  102,15,56,220,217
+       decl    %eax
+.byte  102,15,56,220,225
+.byte  102,15,56,220,233
+       movaps  16(%rcx),%xmm1
+.byte  102,15,56,220,208
+.byte  102,15,56,220,216
+       leaq    32(%rcx),%rcx
+.byte  102,15,56,220,224
+.byte  102,15,56,220,232
+       movaps  (%rcx),%xmm0
        jnz     .Lenc_loop4
 
-.byte  102,15,56,220,197
-       movaps  (%rcx),%xmm4
-.byte  102,15,56,220,205
-.byte  102,15,56,220,213
-.byte  102,15,56,220,221
-.byte  102,15,56,221,196
-.byte  102,15,56,221,204
-.byte  102,15,56,221,212
-.byte  102,15,56,221,220
+.byte  102,15,56,220,209
+.byte  102,15,56,220,217
+.byte  102,15,56,220,225
+.byte  102,15,56,220,233
+.byte  102,15,56,221,208
+.byte  102,15,56,221,216
+.byte  102,15,56,221,224
+.byte  102,15,56,221,232
        .byte   0xf3,0xc3
 .size  _aesni_encrypt4,.-_aesni_encrypt4
 .type  _aesni_decrypt4,@function
 .align 16
 _aesni_decrypt4:
-       movaps  (%rcx),%xmm4
+       movaps  (%rcx),%xmm0
        shrl    $1,%eax
-       movaps  16(%rcx),%xmm5
+       movaps  16(%rcx),%xmm1
        leaq    32(%rcx),%rcx
-       pxor    %xmm4,%xmm0
-       pxor    %xmm4,%xmm1
-       pxor    %xmm4,%xmm2
-       pxor    %xmm4,%xmm3
+       xorps   %xmm0,%xmm2
+       xorps   %xmm0,%xmm3
+       xorps   %xmm0,%xmm4
+       xorps   %xmm0,%xmm5
+       movaps  (%rcx),%xmm0
 
 .Ldec_loop4:
-.byte  102,15,56,222,197
-       movaps  (%rcx),%xmm4
-.byte  102,15,56,222,205
-       decl    %eax
-.byte  102,15,56,222,213
-.byte  102,15,56,222,221
-.byte  102,15,56,222,196
-       movaps  16(%rcx),%xmm5
-.byte  102,15,56,222,204
-       leaq    32(%rcx),%rcx
-.byte  102,15,56,222,212
-.byte  102,15,56,222,220
+.byte  102,15,56,222,209
+.byte  102,15,56,222,217
+       decl    %eax
+.byte  102,15,56,222,225
+.byte  102,15,56,222,233
+       movaps  16(%rcx),%xmm1
+.byte  102,15,56,222,208
+.byte  102,15,56,222,216
+       leaq    32(%rcx),%rcx
+.byte  102,15,56,222,224
+.byte  102,15,56,222,232
+       movaps  (%rcx),%xmm0
        jnz     .Ldec_loop4
 
-.byte  102,15,56,222,197
-       movaps  (%rcx),%xmm4
-.byte  102,15,56,222,205
-.byte  102,15,56,222,213
-.byte  102,15,56,222,221
-.byte  102,15,56,223,196
-.byte  102,15,56,223,204
-.byte  102,15,56,223,212
-.byte  102,15,56,223,220
+.byte  102,15,56,222,209
+.byte  102,15,56,222,217
+.byte  102,15,56,222,225
+.byte  102,15,56,222,233
+.byte  102,15,56,223,208
+.byte  102,15,56,223,216
+.byte  102,15,56,223,224
+.byte  102,15,56,223,232
        .byte   0xf3,0xc3
 .size  _aesni_decrypt4,.-_aesni_decrypt4
+.type  _aesni_encrypt6,@function
+.align 16
+_aesni_encrypt6:
+       movaps  (%rcx),%xmm0
+       shrl    $1,%eax
+       movaps  16(%rcx),%xmm1
+       leaq    32(%rcx),%rcx
+       xorps   %xmm0,%xmm2
+       pxor    %xmm0,%xmm3
+.byte  102,15,56,220,209
+       pxor    %xmm0,%xmm4
+.byte  102,15,56,220,217
+       pxor    %xmm0,%xmm5
+.byte  102,15,56,220,225
+       pxor    %xmm0,%xmm6
+.byte  102,15,56,220,233
+       pxor    %xmm0,%xmm7
+       decl    %eax
+.byte  102,15,56,220,241
+       movaps  (%rcx),%xmm0
+.byte  102,15,56,220,249
+       jmp     .Lenc_loop6_enter
+.align 16
+.Lenc_loop6:
+.byte  102,15,56,220,209
+.byte  102,15,56,220,217
+       decl    %eax
+.byte  102,15,56,220,225
+.byte  102,15,56,220,233
+.byte  102,15,56,220,241
+.byte  102,15,56,220,249
+.Lenc_loop6_enter:
+       movaps  16(%rcx),%xmm1
+.byte  102,15,56,220,208
+.byte  102,15,56,220,216
+       leaq    32(%rcx),%rcx
+.byte  102,15,56,220,224
+.byte  102,15,56,220,232
+.byte  102,15,56,220,240
+.byte  102,15,56,220,248
+       movaps  (%rcx),%xmm0
+       jnz     .Lenc_loop6
+
+.byte  102,15,56,220,209
+.byte  102,15,56,220,217
+.byte  102,15,56,220,225
+.byte  102,15,56,220,233
+.byte  102,15,56,220,241
+.byte  102,15,56,220,249
+.byte  102,15,56,221,208
+.byte  102,15,56,221,216
+.byte  102,15,56,221,224
+.byte  102,15,56,221,232
+.byte  102,15,56,221,240
+.byte  102,15,56,221,248
+       .byte   0xf3,0xc3
+.size  _aesni_encrypt6,.-_aesni_encrypt6
+.type  _aesni_decrypt6,@function
+.align 16
+_aesni_decrypt6:
+       movaps  (%rcx),%xmm0
+       shrl    $1,%eax
+       movaps  16(%rcx),%xmm1
+       leaq    32(%rcx),%rcx
+       xorps   %xmm0,%xmm2
+       pxor    %xmm0,%xmm3
+.byte  102,15,56,222,209
+       pxor    %xmm0,%xmm4
+.byte  102,15,56,222,217
+       pxor    %xmm0,%xmm5
+.byte  102,15,56,222,225
+       pxor    %xmm0,%xmm6
+.byte  102,15,56,222,233
+       pxor    %xmm0,%xmm7
+       decl    %eax
+.byte  102,15,56,222,241
+       movaps  (%rcx),%xmm0
+.byte  102,15,56,222,249
+       jmp     .Ldec_loop6_enter
+.align 16
+.Ldec_loop6:
+.byte  102,15,56,222,209
+.byte  102,15,56,222,217
+       decl    %eax
+.byte  102,15,56,222,225
+.byte  102,15,56,222,233
+.byte  102,15,56,222,241
+.byte  102,15,56,222,249
+.Ldec_loop6_enter:
+       movaps  16(%rcx),%xmm1
+.byte  102,15,56,222,208
+.byte  102,15,56,222,216
+       leaq    32(%rcx),%rcx
+.byte  102,15,56,222,224
+.byte  102,15,56,222,232
+.byte  102,15,56,222,240
+.byte  102,15,56,222,248
+       movaps  (%rcx),%xmm0
+       jnz     .Ldec_loop6
+
+.byte  102,15,56,222,209
+.byte  102,15,56,222,217
+.byte  102,15,56,222,225
+.byte  102,15,56,222,233
+.byte  102,15,56,222,241
+.byte  102,15,56,222,249
+.byte  102,15,56,223,208
+.byte  102,15,56,223,216
+.byte  102,15,56,223,224
+.byte  102,15,56,223,232
+.byte  102,15,56,223,240
+.byte  102,15,56,223,248
+       .byte   0xf3,0xc3
+.size  _aesni_decrypt6,.-_aesni_decrypt6
+.type  _aesni_encrypt8,@function
+.align 16
+_aesni_encrypt8:
+       movaps  (%rcx),%xmm0
+       shrl    $1,%eax
+       movaps  16(%rcx),%xmm1
+       leaq    32(%rcx),%rcx
+       xorps   %xmm0,%xmm2
+       xorps   %xmm0,%xmm3
+.byte  102,15,56,220,209
+       pxor    %xmm0,%xmm4
+.byte  102,15,56,220,217
+       pxor    %xmm0,%xmm5
+.byte  102,15,56,220,225
+       pxor    %xmm0,%xmm6
+.byte  102,15,56,220,233
+       pxor    %xmm0,%xmm7
+       decl    %eax
+.byte  102,15,56,220,241
+       pxor    %xmm0,%xmm8
+.byte  102,15,56,220,249
+       pxor    %xmm0,%xmm9
+       movaps  (%rcx),%xmm0
+.byte  102,68,15,56,220,193
+.byte  102,68,15,56,220,201
+       movaps  16(%rcx),%xmm1
+       jmp     .Lenc_loop8_enter
+.align 16
+.Lenc_loop8:
+.byte  102,15,56,220,209
+.byte  102,15,56,220,217
+       decl    %eax
+.byte  102,15,56,220,225
+.byte  102,15,56,220,233
+.byte  102,15,56,220,241
+.byte  102,15,56,220,249
+.byte  102,68,15,56,220,193
+.byte  102,68,15,56,220,201
+       movaps  16(%rcx),%xmm1
+.Lenc_loop8_enter:
+.byte  102,15,56,220,208
+.byte  102,15,56,220,216
+       leaq    32(%rcx),%rcx
+.byte  102,15,56,220,224
+.byte  102,15,56,220,232
+.byte  102,15,56,220,240
+.byte  102,15,56,220,248
+.byte  102,68,15,56,220,192
+.byte  102,68,15,56,220,200
+       movaps  (%rcx),%xmm0
+       jnz     .Lenc_loop8
+
+.byte  102,15,56,220,209
+.byte  102,15,56,220,217
+.byte  102,15,56,220,225
+.byte  102,15,56,220,233
+.byte  102,15,56,220,241
+.byte  102,15,56,220,249
+.byte  102,68,15,56,220,193
+.byte  102,68,15,56,220,201
+.byte  102,15,56,221,208
+.byte  102,15,56,221,216
+.byte  102,15,56,221,224
+.byte  102,15,56,221,232
+.byte  102,15,56,221,240
+.byte  102,15,56,221,248
+.byte  102,68,15,56,221,192
+.byte  102,68,15,56,221,200
+       .byte   0xf3,0xc3
+.size  _aesni_encrypt8,.-_aesni_encrypt8
+.type  _aesni_decrypt8,@function
+.align 16
+_aesni_decrypt8:
+       movaps  (%rcx),%xmm0
+       shrl    $1,%eax
+       movaps  16(%rcx),%xmm1
+       leaq    32(%rcx),%rcx
+       xorps   %xmm0,%xmm2
+       xorps   %xmm0,%xmm3
+.byte  102,15,56,222,209
+       pxor    %xmm0,%xmm4
+.byte  102,15,56,222,217
+       pxor    %xmm0,%xmm5
+.byte  102,15,56,222,225
+       pxor    %xmm0,%xmm6
+.byte  102,15,56,222,233
+       pxor    %xmm0,%xmm7
+       decl    %eax
+.byte  102,15,56,222,241
+       pxor    %xmm0,%xmm8
+.byte  102,15,56,222,249
+       pxor    %xmm0,%xmm9
+       movaps  (%rcx),%xmm0
+.byte  102,68,15,56,222,193
+.byte  102,68,15,56,222,201
+       movaps  16(%rcx),%xmm1
+       jmp     .Ldec_loop8_enter
+.align 16
+.Ldec_loop8:
+.byte  102,15,56,222,209
+.byte  102,15,56,222,217
+       decl    %eax
+.byte  102,15,56,222,225
+.byte  102,15,56,222,233
+.byte  102,15,56,222,241
+.byte  102,15,56,222,249
+.byte  102,68,15,56,222,193
+.byte  102,68,15,56,222,201
+       movaps  16(%rcx),%xmm1
+.Ldec_loop8_enter:
+.byte  102,15,56,222,208
+.byte  102,15,56,222,216
+       leaq    32(%rcx),%rcx
+.byte  102,15,56,222,224
+.byte  102,15,56,222,232
+.byte  102,15,56,222,240
+.byte  102,15,56,222,248
+.byte  102,68,15,56,222,192
+.byte  102,68,15,56,222,200
+       movaps  (%rcx),%xmm0
+       jnz     .Ldec_loop8
+
+.byte  102,15,56,222,209
+.byte  102,15,56,222,217
+.byte  102,15,56,222,225
+.byte  102,15,56,222,233
+.byte  102,15,56,222,241
+.byte  102,15,56,222,249
+.byte  102,68,15,56,222,193
+.byte  102,68,15,56,222,201
+.byte  102,15,56,223,208
+.byte  102,15,56,223,216
+.byte  102,15,56,223,224
+.byte  102,15,56,223,232
+.byte  102,15,56,223,240
+.byte  102,15,56,223,248
+.byte  102,68,15,56,223,192
+.byte  102,68,15,56,223,200
+       .byte   0xf3,0xc3
+.size  _aesni_decrypt8,.-_aesni_decrypt8
 .globl aesni_ecb_encrypt
 .type  aesni_ecb_encrypt,@function
 .align 16
 aesni_ecb_encrypt:
-       cmpq    $16,%rdx
-       jb      .Lecb_ret
+       andq    $-16,%rdx
+       jz      .Lecb_ret
 
        movl    240(%rcx),%eax
-       andq    $-16,%rdx
+       movaps  (%rcx),%xmm0
        movq    %rcx,%r11
-       testl   %r8d,%r8d
        movl    %eax,%r10d
+       testl   %r8d,%r8d
        jz      .Lecb_decrypt
 
-       subq    $64,%rdx
-       jbe     .Lecb_enc_tail
-       jmp     .Lecb_enc_loop3
+       cmpq    $128,%rdx
+       jb      .Lecb_enc_tail
+
+       movdqu  (%rdi),%xmm2
+       movdqu  16(%rdi),%xmm3
+       movdqu  32(%rdi),%xmm4
+       movdqu  48(%rdi),%xmm5
+       movdqu  64(%rdi),%xmm6
+       movdqu  80(%rdi),%xmm7
+       movdqu  96(%rdi),%xmm8
+       movdqu  112(%rdi),%xmm9
+       leaq    128(%rdi),%rdi
+       subq    $128,%rdx
+       jmp     .Lecb_enc_loop8_enter
 .align 16
-.Lecb_enc_loop3:
-       movups  (%rdi),%xmm0
-       movups  16(%rdi),%xmm1
-       movups  32(%rdi),%xmm2
-       call    _aesni_encrypt3
-       subq    $48,%rdx
-       leaq    48(%rdi),%rdi
-       leaq    48(%rsi),%rsi
-       movups  %xmm0,-48(%rsi)
-       movl    %r10d,%eax
-       movups  %xmm1,-32(%rsi)
+.Lecb_enc_loop8:
+       movups  %xmm2,(%rsi)
        movq    %r11,%rcx
-       movups  %xmm2,-16(%rsi)
-       ja      .Lecb_enc_loop3
+       movdqu  (%rdi),%xmm2
+       movl    %r10d,%eax
+       movups  %xmm3,16(%rsi)
+       movdqu  16(%rdi),%xmm3
+       movups  %xmm4,32(%rsi)
+       movdqu  32(%rdi),%xmm4
+       movups  %xmm5,48(%rsi)
+       movdqu  48(%rdi),%xmm5
+       movups  %xmm6,64(%rsi)
+       movdqu  64(%rdi),%xmm6
+       movups  %xmm7,80(%rsi)
+       movdqu  80(%rdi),%xmm7
+       movups  %xmm8,96(%rsi)
+       movdqu  96(%rdi),%xmm8
+       movups  %xmm9,112(%rsi)
+       leaq    128(%rsi),%rsi
+       movdqu  112(%rdi),%xmm9
+       leaq    128(%rdi),%rdi
+.Lecb_enc_loop8_enter:
 
-.Lecb_enc_tail:
-       addq    $64,%rdx
+       call    _aesni_encrypt8
+
+       subq    $128,%rdx
+       jnc     .Lecb_enc_loop8
+
+       movups  %xmm2,(%rsi)
+       movq    %r11,%rcx
+       movups  %xmm3,16(%rsi)
+       movl    %r10d,%eax
+       movups  %xmm4,32(%rsi)
+       movups  %xmm5,48(%rsi)
+       movups  %xmm6,64(%rsi)
+       movups  %xmm7,80(%rsi)
+       movups  %xmm8,96(%rsi)
+       movups  %xmm9,112(%rsi)
+       leaq    128(%rsi),%rsi
+       addq    $128,%rdx
        jz      .Lecb_ret
 
-       cmpq    $16,%rdx
-       movups  (%rdi),%xmm0
-       je      .Lecb_enc_one
+.Lecb_enc_tail:
+       movups  (%rdi),%xmm2
        cmpq    $32,%rdx
-       movups  16(%rdi),%xmm1
+       jb      .Lecb_enc_one
+       movups  16(%rdi),%xmm3
        je      .Lecb_enc_two
-       cmpq    $48,%rdx
-       movups  32(%rdi),%xmm2
-       je      .Lecb_enc_three
-       movups  48(%rdi),%xmm3
-       call    _aesni_encrypt4
-       movups  %xmm0,(%rsi)
-       movups  %xmm1,16(%rsi)
-       movups  %xmm2,32(%rsi)
-       movups  %xmm3,48(%rsi)
+       movups  32(%rdi),%xmm4
+       cmpq    $64,%rdx
+       jb      .Lecb_enc_three
+       movups  48(%rdi),%xmm5
+       je      .Lecb_enc_four
+       movups  64(%rdi),%xmm6
+       cmpq    $96,%rdx
+       jb      .Lecb_enc_five
+       movups  80(%rdi),%xmm7
+       je      .Lecb_enc_six
+       movdqu  96(%rdi),%xmm8
+       call    _aesni_encrypt8
+       movups  %xmm2,(%rsi)
+       movups  %xmm3,16(%rsi)
+       movups  %xmm4,32(%rsi)
+       movups  %xmm5,48(%rsi)
+       movups  %xmm6,64(%rsi)
+       movups  %xmm7,80(%rsi)
+       movups  %xmm8,96(%rsi)
        jmp     .Lecb_ret
 .align 16
 .Lecb_enc_one:
-       movaps  (%rcx),%xmm4
-       movaps  16(%rcx),%xmm5
+       movaps  (%rcx),%xmm0
+       movaps  16(%rcx),%xmm1
        leaq    32(%rcx),%rcx
-       pxor    %xmm4,%xmm0
+       xorps   %xmm0,%xmm2
 .Loop_enc1_3:
-.byte  102,15,56,220,197
+.byte  102,15,56,220,209
        decl    %eax
-       movaps  (%rcx),%xmm5
+       movaps  (%rcx),%xmm1
        leaq    16(%rcx),%rcx
        jnz     .Loop_enc1_3    
-.byte  102,15,56,221,197
-       movups  %xmm0,(%rsi)
+.byte  102,15,56,221,209
+       movups  %xmm2,(%rsi)
        jmp     .Lecb_ret
 .align 16
 .Lecb_enc_two:
+       xorps   %xmm4,%xmm4
        call    _aesni_encrypt3
-       movups  %xmm0,(%rsi)
-       movups  %xmm1,16(%rsi)
+       movups  %xmm2,(%rsi)
+       movups  %xmm3,16(%rsi)
        jmp     .Lecb_ret
 .align 16
 .Lecb_enc_three:
        call    _aesni_encrypt3
-       movups  %xmm0,(%rsi)
-       movups  %xmm1,16(%rsi)
-       movups  %xmm2,32(%rsi)
+       movups  %xmm2,(%rsi)
+       movups  %xmm3,16(%rsi)
+       movups  %xmm4,32(%rsi)
+       jmp     .Lecb_ret
+.align 16
+.Lecb_enc_four:
+       call    _aesni_encrypt4
+       movups  %xmm2,(%rsi)
+       movups  %xmm3,16(%rsi)
+       movups  %xmm4,32(%rsi)
+       movups  %xmm5,48(%rsi)
+       jmp     .Lecb_ret
+.align 16
+.Lecb_enc_five:
+       xorps   %xmm7,%xmm7
+       call    _aesni_encrypt6
+       movups  %xmm2,(%rsi)
+       movups  %xmm3,16(%rsi)
+       movups  %xmm4,32(%rsi)
+       movups  %xmm5,48(%rsi)
+       movups  %xmm6,64(%rsi)
+       jmp     .Lecb_ret
+.align 16
+.Lecb_enc_six:
+       call    _aesni_encrypt6
+       movups  %xmm2,(%rsi)
+       movups  %xmm3,16(%rsi)
+       movups  %xmm4,32(%rsi)
+       movups  %xmm5,48(%rsi)
+       movups  %xmm6,64(%rsi)
+       movups  %xmm7,80(%rsi)
        jmp     .Lecb_ret
 
 .align 16
 .Lecb_decrypt:
-       subq    $64,%rdx
-       jbe     .Lecb_dec_tail
-       jmp     .Lecb_dec_loop3
+       cmpq    $128,%rdx
+       jb      .Lecb_dec_tail
+
+       movdqu  (%rdi),%xmm2
+       movdqu  16(%rdi),%xmm3
+       movdqu  32(%rdi),%xmm4
+       movdqu  48(%rdi),%xmm5
+       movdqu  64(%rdi),%xmm6
+       movdqu  80(%rdi),%xmm7
+       movdqu  96(%rdi),%xmm8
+       movdqu  112(%rdi),%xmm9
+       leaq    128(%rdi),%rdi
+       subq    $128,%rdx
+       jmp     .Lecb_dec_loop8_enter
 .align 16
-.Lecb_dec_loop3:
-       movups  (%rdi),%xmm0
-       movups  16(%rdi),%xmm1
-       movups  32(%rdi),%xmm2
-       call    _aesni_decrypt3
-       subq    $48,%rdx
-       leaq    48(%rdi),%rdi
-       leaq    48(%rsi),%rsi
-       movups  %xmm0,-48(%rsi)
-       movl    %r10d,%eax
-       movups  %xmm1,-32(%rsi)
+.Lecb_dec_loop8:
+       movups  %xmm2,(%rsi)
        movq    %r11,%rcx
-       movups  %xmm2,-16(%rsi)
-       ja      .Lecb_dec_loop3
+       movdqu  (%rdi),%xmm2
+       movl    %r10d,%eax
+       movups  %xmm3,16(%rsi)
+       movdqu  16(%rdi),%xmm3
+       movups  %xmm4,32(%rsi)
+       movdqu  32(%rdi),%xmm4
+       movups  %xmm5,48(%rsi)
+       movdqu  48(%rdi),%xmm5
+       movups  %xmm6,64(%rsi)
+       movdqu  64(%rdi),%xmm6
+       movups  %xmm7,80(%rsi)
+       movdqu  80(%rdi),%xmm7
+       movups  %xmm8,96(%rsi)
+       movdqu  96(%rdi),%xmm8
+       movups  %xmm9,112(%rsi)
+       leaq    128(%rsi),%rsi
+       movdqu  112(%rdi),%xmm9
+       leaq    128(%rdi),%rdi
+.Lecb_dec_loop8_enter:
 
-.Lecb_dec_tail:
-       addq    $64,%rdx
+       call    _aesni_decrypt8
+
+       movaps  (%r11),%xmm0
+       subq    $128,%rdx
+       jnc     .Lecb_dec_loop8
+
+       movups  %xmm2,(%rsi)
+       movq    %r11,%rcx
+       movups  %xmm3,16(%rsi)
+       movl    %r10d,%eax
+       movups  %xmm4,32(%rsi)
+       movups  %xmm5,48(%rsi)
+       movups  %xmm6,64(%rsi)
+       movups  %xmm7,80(%rsi)
+       movups  %xmm8,96(%rsi)
+       movups  %xmm9,112(%rsi)
+       leaq    128(%rsi),%rsi
+       addq    $128,%rdx
        jz      .Lecb_ret
 
-       cmpq    $16,%rdx
-       movups  (%rdi),%xmm0
-       je      .Lecb_dec_one
+.Lecb_dec_tail:
+       movups  (%rdi),%xmm2
        cmpq    $32,%rdx
-       movups  16(%rdi),%xmm1
+       jb      .Lecb_dec_one
+       movups  16(%rdi),%xmm3
        je      .Lecb_dec_two
-       cmpq    $48,%rdx
-       movups  32(%rdi),%xmm2
-       je      .Lecb_dec_three
-       movups  48(%rdi),%xmm3
-       call    _aesni_decrypt4
-       movups  %xmm0,(%rsi)
-       movups  %xmm1,16(%rsi)
-       movups  %xmm2,32(%rsi)
-       movups  %xmm3,48(%rsi)
+       movups  32(%rdi),%xmm4
+       cmpq    $64,%rdx
+       jb      .Lecb_dec_three
+       movups  48(%rdi),%xmm5
+       je      .Lecb_dec_four
+       movups  64(%rdi),%xmm6
+       cmpq    $96,%rdx
+       jb      .Lecb_dec_five
+       movups  80(%rdi),%xmm7
+       je      .Lecb_dec_six
+       movups  96(%rdi),%xmm8
+       movaps  (%rcx),%xmm0
+       call    _aesni_decrypt8
+       movups  %xmm2,(%rsi)
+       movups  %xmm3,16(%rsi)
+       movups  %xmm4,32(%rsi)
+       movups  %xmm5,48(%rsi)
+       movups  %xmm6,64(%rsi)
+       movups  %xmm7,80(%rsi)
+       movups  %xmm8,96(%rsi)
        jmp     .Lecb_ret
 .align 16
 .Lecb_dec_one:
-       movaps  (%rcx),%xmm4
-       movaps  16(%rcx),%xmm5
+       movaps  (%rcx),%xmm0
+       movaps  16(%rcx),%xmm1
        leaq    32(%rcx),%rcx
-       pxor    %xmm4,%xmm0
+       xorps   %xmm0,%xmm2
 .Loop_dec1_4:
-.byte  102,15,56,222,197
+.byte  102,15,56,222,209
        decl    %eax
-       movaps  (%rcx),%xmm5
+       movaps  (%rcx),%xmm1
        leaq    16(%rcx),%rcx
        jnz     .Loop_dec1_4    
-.byte  102,15,56,223,197
-       movups  %xmm0,(%rsi)
+.byte  102,15,56,223,209
+       movups  %xmm2,(%rsi)
        jmp     .Lecb_ret
 .align 16
 .Lecb_dec_two:
+       xorps   %xmm4,%xmm4
        call    _aesni_decrypt3
-       movups  %xmm0,(%rsi)
-       movups  %xmm1,16(%rsi)
+       movups  %xmm2,(%rsi)
+       movups  %xmm3,16(%rsi)
        jmp     .Lecb_ret
 .align 16
 .Lecb_dec_three:
        call    _aesni_decrypt3
-       movups  %xmm0,(%rsi)
-       movups  %xmm1,16(%rsi)
-       movups  %xmm2,32(%rsi)
+       movups  %xmm2,(%rsi)
+       movups  %xmm3,16(%rsi)
+       movups  %xmm4,32(%rsi)
+       jmp     .Lecb_ret
+.align 16
+.Lecb_dec_four:
+       call    _aesni_decrypt4
+       movups  %xmm2,(%rsi)
+       movups  %xmm3,16(%rsi)
+       movups  %xmm4,32(%rsi)
+       movups  %xmm5,48(%rsi)
+       jmp     .Lecb_ret
+.align 16
+.Lecb_dec_five:
+       xorps   %xmm7,%xmm7
+       call    _aesni_decrypt6
+       movups  %xmm2,(%rsi)
+       movups  %xmm3,16(%rsi)
+       movups  %xmm4,32(%rsi)
+       movups  %xmm5,48(%rsi)
+       movups  %xmm6,64(%rsi)
+       jmp     .Lecb_ret
+.align 16
+.Lecb_dec_six:
+       call    _aesni_decrypt6
+       movups  %xmm2,(%rsi)
+       movups  %xmm3,16(%rsi)
+       movups  %xmm4,32(%rsi)
+       movups  %xmm5,48(%rsi)
+       movups  %xmm6,64(%rsi)
+       movups  %xmm7,80(%rsi)
 
 .Lecb_ret:
        .byte   0xf3,0xc3
 .size  aesni_ecb_encrypt,.-aesni_ecb_encrypt
+.globl aesni_ccm64_encrypt_blocks
+.type  aesni_ccm64_encrypt_blocks,@function
+.align 16
+aesni_ccm64_encrypt_blocks:
+       movdqu  (%r8),%xmm9
+       movdqu  (%r9),%xmm3
+       movdqa  .Lincrement64(%rip),%xmm8
+       movdqa  .Lbswap_mask(%rip),%xmm9
+.byte  102,69,15,56,0,201
+
+       movl    240(%rcx),%eax
+       movq    %rcx,%r11
+       movl    %eax,%r10d
+       movdqa  %xmm9,%xmm2
+
+.Lccm64_enc_outer:
+       movups  (%rdi),%xmm8
+.byte  102,65,15,56,0,209
+       movq    %r11,%rcx
+       movl    %r10d,%eax
+
+       movaps  (%rcx),%xmm0
+       shrl    $1,%eax
+       movaps  16(%rcx),%xmm1
+       xorps   %xmm0,%xmm8
+       leaq    32(%rcx),%rcx
+       xorps   %xmm0,%xmm2
+       xorps   %xmm3,%xmm8
+       movaps  (%rcx),%xmm0
+
+.Lccm64_enc2_loop:
+.byte  102,15,56,220,209
+       decl    %eax
+.byte  102,15,56,220,217
+       movaps  16(%rcx),%xmm1
+.byte  102,15,56,220,208
+       leaq    32(%rcx),%rcx
+.byte  102,15,56,220,216
+       movaps  0(%rcx),%xmm0
+       jnz     .Lccm64_enc2_loop
+.byte  102,15,56,220,209
+.byte  102,15,56,220,217
+.byte  102,15,56,221,208
+.byte  102,15,56,221,216
+
+       paddq   %xmm8,%xmm9
+       decq    %rdx
+       leaq    16(%rdi),%rdi
+       xorps   %xmm2,%xmm8
+       movdqa  %xmm9,%xmm2
+       movups  %xmm8,(%rsi)
+       leaq    16(%rsi),%rsi
+       jnz     .Lccm64_enc_outer
+
+       movups  %xmm3,(%r9)
+       .byte   0xf3,0xc3
+.size  aesni_ccm64_encrypt_blocks,.-aesni_ccm64_encrypt_blocks
+.globl aesni_ccm64_decrypt_blocks
+.type  aesni_ccm64_decrypt_blocks,@function
+.align 16
+aesni_ccm64_decrypt_blocks:
+       movdqu  (%r8),%xmm9
+       movdqu  (%r9),%xmm3
+       movdqa  .Lincrement64(%rip),%xmm8
+       movdqa  .Lbswap_mask(%rip),%xmm9
+
+       movl    240(%rcx),%eax
+       movdqa  %xmm9,%xmm2
+.byte  102,69,15,56,0,201
+       movl    %eax,%r10d
+       movq    %rcx,%r11
+       movaps  (%rcx),%xmm0
+       movaps  16(%rcx),%xmm1
+       leaq    32(%rcx),%rcx
+       xorps   %xmm0,%xmm2
+.Loop_enc1_5:
+.byte  102,15,56,220,209
+       decl    %eax
+       movaps  (%rcx),%xmm1
+       leaq    16(%rcx),%rcx
+       jnz     .Loop_enc1_5    
+.byte  102,15,56,221,209
+.Lccm64_dec_outer:
+       paddq   %xmm8,%xmm9
+       movups  (%rdi),%xmm8
+       xorps   %xmm2,%xmm8
+       movdqa  %xmm9,%xmm2
+       leaq    16(%rdi),%rdi
+.byte  102,65,15,56,0,209
+       movq    %r11,%rcx
+       movl    %r10d,%eax
+       movups  %xmm8,(%rsi)
+       leaq    16(%rsi),%rsi
+
+       subq    $1,%rdx
+       jz      .Lccm64_dec_break
+
+       movaps  (%rcx),%xmm0
+       shrl    $1,%eax
+       movaps  16(%rcx),%xmm1
+       xorps   %xmm0,%xmm8
+       leaq    32(%rcx),%rcx
+       xorps   %xmm0,%xmm2
+       xorps   %xmm8,%xmm3
+       movaps  (%rcx),%xmm0
+
+.Lccm64_dec2_loop:
+.byte  102,15,56,220,209
+       decl    %eax
+.byte  102,15,56,220,217
+       movaps  16(%rcx),%xmm1
+.byte  102,15,56,220,208
+       leaq    32(%rcx),%rcx
+.byte  102,15,56,220,216
+       movaps  0(%rcx),%xmm0
+       jnz     .Lccm64_dec2_loop
+.byte  102,15,56,220,209
+.byte  102,15,56,220,217
+.byte  102,15,56,221,208
+       jmp     .Lccm64_dec_outer
+
+.align 16
+.Lccm64_dec_break:
+       movaps  (%rcx),%xmm0
+       movaps  16(%rcx),%xmm1
+       leaq    32(%rcx),%rcx
+       xorps   %xmm0,%xmm3
+.Loop_enc1_6:
+.byte  102,15,56,220,217
+       decl    %eax
+       movaps  (%rcx),%xmm1
+       leaq    16(%rcx),%rcx
+       jnz     .Loop_enc1_6    
+.byte  102,15,56,221,217
+       movups  %xmm3,(%r9)
+       .byte   0xf3,0xc3
+.size  aesni_ccm64_decrypt_blocks,.-aesni_ccm64_decrypt_blocks
+.globl aesni_ctr32_encrypt_blocks
+.type  aesni_ctr32_encrypt_blocks,@function
+.align 16
+aesni_ctr32_encrypt_blocks:
+       cmpq    $1,%rdx
+       je      .Lctr32_one_shortcut
+
+       movdqu  (%r8),%xmm14
+       movdqa  .Lbswap_mask(%rip),%xmm15
+       xorl    %eax,%eax
+.byte  102,69,15,58,22,242,3
+.byte  102,68,15,58,34,240,3
+
+       movl    240(%rcx),%eax
+       bswapl  %r10d
+       pxor    %xmm12,%xmm12
+       pxor    %xmm13,%xmm13
+.byte  102,69,15,58,34,226,0
+       leaq    3(%r10),%r11
+.byte  102,69,15,58,34,235,0
+       incl    %r10d
+.byte  102,69,15,58,34,226,1
+       incq    %r11
+.byte  102,69,15,58,34,235,1
+       incl    %r10d
+.byte  102,69,15,58,34,226,2
+       incq    %r11
+.byte  102,69,15,58,34,235,2
+       movdqa  %xmm12,-40(%rsp)
+.byte  102,69,15,56,0,231
+       movdqa  %xmm13,-24(%rsp)
+.byte  102,69,15,56,0,239
+
+       pshufd  $192,%xmm12,%xmm2
+       pshufd  $128,%xmm12,%xmm3
+       pshufd  $64,%xmm12,%xmm4
+       cmpq    $6,%rdx
+       jb      .Lctr32_tail
+       shrl    $1,%eax
+       movq    %rcx,%r11
+       movl    %eax,%r10d
+       subq    $6,%rdx
+       jmp     .Lctr32_loop6
+
+.align 16
+.Lctr32_loop6:
+       pshufd  $192,%xmm13,%xmm5
+       por     %xmm14,%xmm2
+       movaps  (%r11),%xmm0
+       pshufd  $128,%xmm13,%xmm6
+       por     %xmm14,%xmm3
+       movaps  16(%r11),%xmm1
+       pshufd  $64,%xmm13,%xmm7
+       por     %xmm14,%xmm4
+       por     %xmm14,%xmm5
+       xorps   %xmm0,%xmm2
+       por     %xmm14,%xmm6
+       por     %xmm14,%xmm7
+
+
+
+
+       pxor    %xmm0,%xmm3
+.byte  102,15,56,220,209
+       leaq    32(%r11),%rcx
+       pxor    %xmm0,%xmm4
+.byte  102,15,56,220,217
+       movdqa  .Lincrement32(%rip),%xmm13
+       pxor    %xmm0,%xmm5
+.byte  102,15,56,220,225
+       movdqa  -40(%rsp),%xmm12
+       pxor    %xmm0,%xmm6
+.byte  102,15,56,220,233
+       pxor    %xmm0,%xmm7
+       movaps  (%rcx),%xmm0
+       decl    %eax
+.byte  102,15,56,220,241
+.byte  102,15,56,220,249
+       jmp     .Lctr32_enc_loop6_enter
+.align 16
+.Lctr32_enc_loop6:
+.byte  102,15,56,220,209
+.byte  102,15,56,220,217
+       decl    %eax
+.byte  102,15,56,220,225
+.byte  102,15,56,220,233
+.byte  102,15,56,220,241
+.byte  102,15,56,220,249
+.Lctr32_enc_loop6_enter:
+       movaps  16(%rcx),%xmm1
+.byte  102,15,56,220,208
+.byte  102,15,56,220,216
+       leaq    32(%rcx),%rcx
+.byte  102,15,56,220,224
+.byte  102,15,56,220,232
+.byte  102,15,56,220,240
+.byte  102,15,56,220,248
+       movaps  (%rcx),%xmm0
+       jnz     .Lctr32_enc_loop6
+
+.byte  102,15,56,220,209
+       paddd   %xmm13,%xmm12
+.byte  102,15,56,220,217
+       paddd   -24(%rsp),%xmm13
+.byte  102,15,56,220,225
+       movdqa  %xmm12,-40(%rsp)
+.byte  102,15,56,220,233
+       movdqa  %xmm13,-24(%rsp)
+.byte  102,15,56,220,241
+.byte  102,69,15,56,0,231
+.byte  102,15,56,220,249
+.byte  102,69,15,56,0,239
+
+.byte  102,15,56,221,208
+       movups  (%rdi),%xmm8
+.byte  102,15,56,221,216
+       movups  16(%rdi),%xmm9
+.byte  102,15,56,221,224
+       movups  32(%rdi),%xmm10
+.byte  102,15,56,221,232
+       movups  48(%rdi),%xmm11
+.byte  102,15,56,221,240
+       movups  64(%rdi),%xmm1
+.byte  102,15,56,221,248
+       movups  80(%rdi),%xmm0
+       leaq    96(%rdi),%rdi
+
+       xorps   %xmm2,%xmm8
+       pshufd  $192,%xmm12,%xmm2
+       xorps   %xmm3,%xmm9
+       pshufd  $128,%xmm12,%xmm3
+       movups  %xmm8,(%rsi)
+       xorps   %xmm4,%xmm10
+       pshufd  $64,%xmm12,%xmm4
+       movups  %xmm9,16(%rsi)
+       xorps   %xmm5,%xmm11
+       movups  %xmm10,32(%rsi)
+       xorps   %xmm6,%xmm1
+       movups  %xmm11,48(%rsi)
+       xorps   %xmm7,%xmm0
+       movups  %xmm1,64(%rsi)
+       movups  %xmm0,80(%rsi)
+       leaq    96(%rsi),%rsi
+       movl    %r10d,%eax
+       subq    $6,%rdx
+       jnc     .Lctr32_loop6
+
+       addq    $6,%rdx
+       jz      .Lctr32_done
+       movq    %r11,%rcx
+       leal    1(%rax,%rax,1),%eax
+
+.Lctr32_tail:
+       por     %xmm14,%xmm2
+       movups  (%rdi),%xmm8
+       cmpq    $2,%rdx
+       jb      .Lctr32_one
+
+       por     %xmm14,%xmm3
+       movups  16(%rdi),%xmm9
+       je      .Lctr32_two
+
+       pshufd  $192,%xmm13,%xmm5
+       por     %xmm14,%xmm4
+       movups  32(%rdi),%xmm10
+       cmpq    $4,%rdx
+       jb      .Lctr32_three
+
+       pshufd  $128,%xmm13,%xmm6
+       por     %xmm14,%xmm5
+       movups  48(%rdi),%xmm11
+       je      .Lctr32_four
+
+       por     %xmm14,%xmm6
+       xorps   %xmm7,%xmm7
+
+       call    _aesni_encrypt6
+
+       movups  64(%rdi),%xmm1
+       xorps   %xmm2,%xmm8
+       xorps   %xmm3,%xmm9
+       movups  %xmm8,(%rsi)
+       xorps   %xmm4,%xmm10
+       movups  %xmm9,16(%rsi)
+       xorps   %xmm5,%xmm11
+       movups  %xmm10,32(%rsi)
+       xorps   %xmm6,%xmm1
+       movups  %xmm11,48(%rsi)
+       movups  %xmm1,64(%rsi)
+       jmp     .Lctr32_done
+
+.align 16
+.Lctr32_one_shortcut:
+       movups  (%r8),%xmm2
+       movups  (%rdi),%xmm8
+       movl    240(%rcx),%eax
+.Lctr32_one:
+       movaps  (%rcx),%xmm0
+       movaps  16(%rcx),%xmm1
+       leaq    32(%rcx),%rcx
+       xorps   %xmm0,%xmm2
+.Loop_enc1_7:
+.byte  102,15,56,220,209
+       decl    %eax
+       movaps  (%rcx),%xmm1
+       leaq    16(%rcx),%rcx
+       jnz     .Loop_enc1_7    
+.byte  102,15,56,221,209
+       xorps   %xmm2,%xmm8
+       movups  %xmm8,(%rsi)
+       jmp     .Lctr32_done
+
+.align 16
+.Lctr32_two:
+       xorps   %xmm4,%xmm4
+       call    _aesni_encrypt3
+       xorps   %xmm2,%xmm8
+       xorps   %xmm3,%xmm9
+       movups  %xmm8,(%rsi)
+       movups  %xmm9,16(%rsi)
+       jmp     .Lctr32_done
+
+.align 16
+.Lctr32_three:
+       call    _aesni_encrypt3
+       xorps   %xmm2,%xmm8
+       xorps   %xmm3,%xmm9
+       movups  %xmm8,(%rsi)
+       xorps   %xmm4,%xmm10
+       movups  %xmm9,16(%rsi)
+       movups  %xmm10,32(%rsi)
+       jmp     .Lctr32_done
+
+.align 16
+.Lctr32_four:
+       call    _aesni_encrypt4
+       xorps   %xmm2,%xmm8
+       xorps   %xmm3,%xmm9
+       movups  %xmm8,(%rsi)
+       xorps   %xmm4,%xmm10
+       movups  %xmm9,16(%rsi)
+       xorps   %xmm5,%xmm11
+       movups  %xmm10,32(%rsi)
+       movups  %xmm11,48(%rsi)
+
+.Lctr32_done:
+       .byte   0xf3,0xc3
+.size  aesni_ctr32_encrypt_blocks,.-aesni_ctr32_encrypt_blocks
+.globl aesni_xts_encrypt
+.type  aesni_xts_encrypt,@function
+.align 16
+aesni_xts_encrypt:
+       leaq    -104(%rsp),%rsp
+       movups  (%r9),%xmm15
+       movl    240(%r8),%eax
+       movl    240(%rcx),%r10d
+       movaps  (%r8),%xmm0
+       movaps  16(%r8),%xmm1
+       leaq    32(%r8),%r8
+       xorps   %xmm0,%xmm15
+.Loop_enc1_8:
+.byte  102,68,15,56,220,249
+       decl    %eax
+       movaps  (%r8),%xmm1
+       leaq    16(%r8),%r8
+       jnz     .Loop_enc1_8    
+.byte  102,68,15,56,221,249
+       movq    %rcx,%r11
+       movl    %r10d,%eax
+       movq    %rdx,%r9
+       andq    $-16,%rdx
+
+       movdqa  .Lxts_magic(%rip),%xmm8
+       pxor    %xmm14,%xmm14
+       pcmpgtd %xmm15,%xmm14
+       pshufd  $19,%xmm14,%xmm9
+       pxor    %xmm14,%xmm14
+       movdqa  %xmm15,%xmm10
+       paddq   %xmm15,%xmm15
+       pand    %xmm8,%xmm9
+       pcmpgtd %xmm15,%xmm14
+       pxor    %xmm9,%xmm15
+       pshufd  $19,%xmm14,%xmm9
+       pxor    %xmm14,%xmm14
+       movdqa  %xmm15,%xmm11
+       paddq   %xmm15,%xmm15
+       pand    %xmm8,%xmm9
+       pcmpgtd %xmm15,%xmm14
+       pxor    %xmm9,%xmm15
+       pshufd  $19,%xmm14,%xmm9
+       pxor    %xmm14,%xmm14
+       movdqa  %xmm15,%xmm12
+       paddq   %xmm15,%xmm15
+       pand    %xmm8,%xmm9
+       pcmpgtd %xmm15,%xmm14
+       pxor    %xmm9,%xmm15
+       pshufd  $19,%xmm14,%xmm9
+       pxor    %xmm14,%xmm14
+       movdqa  %xmm15,%xmm13
+       paddq   %xmm15,%xmm15
+       pand    %xmm8,%xmm9
+       pcmpgtd %xmm15,%xmm14
+       pxor    %xmm9,%xmm15
+       subq    $96,%rdx
+       jc      .Lxts_enc_short
+
+       shrl    $1,%eax
+       subl    $1,%eax
+       movl    %eax,%r10d
+       jmp     .Lxts_enc_grandloop
+
+.align 16
+.Lxts_enc_grandloop:
+       pshufd  $19,%xmm14,%xmm9
+       movdqa  %xmm15,%xmm14
+       paddq   %xmm15,%xmm15
+       movdqu  0(%rdi),%xmm2
+       pand    %xmm8,%xmm9
+       movdqu  16(%rdi),%xmm3
+       pxor    %xmm9,%xmm15
+
+       movdqu  32(%rdi),%xmm4
+       pxor    %xmm10,%xmm2
+       movdqu  48(%rdi),%xmm5
+       pxor    %xmm11,%xmm3
+       movdqu  64(%rdi),%xmm6
+       pxor    %xmm12,%xmm4
+       movdqu  80(%rdi),%xmm7
+       leaq    96(%rdi),%rdi
+       pxor    %xmm13,%xmm5
+       movaps  (%r11),%xmm0
+       pxor    %xmm14,%xmm6
+       pxor    %xmm15,%xmm7
+
+
+
+       movaps  16(%r11),%xmm1
+       pxor    %xmm0,%xmm2
+       pxor    %xmm0,%xmm3
+       movdqa  %xmm10,0(%rsp)
+.byte  102,15,56,220,209
+       leaq    32(%r11),%rcx
+       pxor    %xmm0,%xmm4
+       movdqa  %xmm11,16(%rsp)
+.byte  102,15,56,220,217
+       pxor    %xmm0,%xmm5
+       movdqa  %xmm12,32(%rsp)
+.byte  102,15,56,220,225
+       pxor    %xmm0,%xmm6
+       movdqa  %xmm13,48(%rsp)
+.byte  102,15,56,220,233
+       pxor    %xmm0,%xmm7
+       movaps  (%rcx),%xmm0
+       decl    %eax
+       movdqa  %xmm14,64(%rsp)
+.byte  102,15,56,220,241
+       movdqa  %xmm15,80(%rsp)
+.byte  102,15,56,220,249
+       pxor    %xmm14,%xmm14
+       pcmpgtd %xmm15,%xmm14
+       jmp     .Lxts_enc_loop6_enter
+
+.align 16
+.Lxts_enc_loop6:
+.byte  102,15,56,220,209
+.byte  102,15,56,220,217
+       decl    %eax
+.byte  102,15,56,220,225
+.byte  102,15,56,220,233
+.byte  102,15,56,220,241
+.byte  102,15,56,220,249
+.Lxts_enc_loop6_enter:
+       movaps  16(%rcx),%xmm1
+.byte  102,15,56,220,208
+.byte  102,15,56,220,216
+       leaq    32(%rcx),%rcx
+.byte  102,15,56,220,224
+.byte  102,15,56,220,232
+.byte  102,15,56,220,240
+.byte  102,15,56,220,248
+       movaps  (%rcx),%xmm0
+       jnz     .Lxts_enc_loop6
+
+       pshufd  $19,%xmm14,%xmm9
+       pxor    %xmm14,%xmm14
+       paddq   %xmm15,%xmm15
+.byte  102,15,56,220,209
+       pand    %xmm8,%xmm9
+.byte  102,15,56,220,217
+       pcmpgtd %xmm15,%xmm14
+.byte  102,15,56,220,225
+       pxor    %xmm9,%xmm15
+.byte  102,15,56,220,233
+.byte  102,15,56,220,241
+.byte  102,15,56,220,249
+       movaps  16(%rcx),%xmm1
+
+       pshufd  $19,%xmm14,%xmm9
+       pxor    %xmm14,%xmm14
+       movdqa  %xmm15,%xmm10
+       paddq   %xmm15,%xmm15
+.byte  102,15,56,220,208
+       pand    %xmm8,%xmm9
+.byte  102,15,56,220,216
+       pcmpgtd %xmm15,%xmm14
+.byte  102,15,56,220,224
+       pxor    %xmm9,%xmm15
+.byte  102,15,56,220,232
+.byte  102,15,56,220,240
+.byte  102,15,56,220,248
+       movaps  32(%rcx),%xmm0
+
+       pshufd  $19,%xmm14,%xmm9
+       pxor    %xmm14,%xmm14
+       movdqa  %xmm15,%xmm11
+       paddq   %xmm15,%xmm15
+.byte  102,15,56,220,209
+       pand    %xmm8,%xmm9
+.byte  102,15,56,220,217
+       pcmpgtd %xmm15,%xmm14
+.byte  102,15,56,220,225
+       pxor    %xmm9,%xmm15
+.byte  102,15,56,220,233
+.byte  102,15,56,220,241
+.byte  102,15,56,220,249
+
+       pshufd  $19,%xmm14,%xmm9
+       pxor    %xmm14,%xmm14
+       movdqa  %xmm15,%xmm12
+       paddq   %xmm15,%xmm15
+.byte  102,15,56,221,208
+       pand    %xmm8,%xmm9
+.byte  102,15,56,221,216
+       pcmpgtd %xmm15,%xmm14
+.byte  102,15,56,221,224
+       pxor    %xmm9,%xmm15
+.byte  102,15,56,221,232
+.byte  102,15,56,221,240
+.byte  102,15,56,221,248
+
+       pshufd  $19,%xmm14,%xmm9
+       pxor    %xmm14,%xmm14
+       movdqa  %xmm15,%xmm13
+       paddq   %xmm15,%xmm15
+       xorps   0(%rsp),%xmm2
+       pand    %xmm8,%xmm9
+       xorps   16(%rsp),%xmm3
+       pcmpgtd %xmm15,%xmm14
+       pxor    %xmm9,%xmm15
+
+       xorps   32(%rsp),%xmm4
+       movups  %xmm2,0(%rsi)
+       xorps   48(%rsp),%xmm5
+       movups  %xmm3,16(%rsi)
+       xorps   64(%rsp),%xmm6
+       movups  %xmm4,32(%rsi)
+       xorps   80(%rsp),%xmm7
+       movups  %xmm5,48(%rsi)
+       movl    %r10d,%eax
+       movups  %xmm6,64(%rsi)
+       movups  %xmm7,80(%rsi)
+       leaq    96(%rsi),%rsi
+       subq    $96,%rdx
+       jnc     .Lxts_enc_grandloop
+
+       leal    3(%rax,%rax,1),%eax
+       movq    %r11,%rcx
+       movl    %eax,%r10d
+
+.Lxts_enc_short:
+       addq    $96,%rdx
+       jz      .Lxts_enc_done
+
+       cmpq    $32,%rdx
+       jb      .Lxts_enc_one
+       je      .Lxts_enc_two
+
+       cmpq    $64,%rdx
+       jb      .Lxts_enc_three
+       je      .Lxts_enc_four
+
+       pshufd  $19,%xmm14,%xmm9
+       movdqa  %xmm15,%xmm14
+       paddq   %xmm15,%xmm15
+       movdqu  (%rdi),%xmm2
+       pand    %xmm8,%xmm9
+       movdqu  16(%rdi),%xmm3
+       pxor    %xmm9,%xmm15
+
+       movdqu  32(%rdi),%xmm4
+       pxor    %xmm10,%xmm2
+       movdqu  48(%rdi),%xmm5
+       pxor    %xmm11,%xmm3
+       movdqu  64(%rdi),%xmm6
+       leaq    80(%rdi),%rdi
+       pxor    %xmm12,%xmm4
+       pxor    %xmm13,%xmm5
+       pxor    %xmm14,%xmm6
+
+       call    _aesni_encrypt6
+
+       xorps   %xmm10,%xmm2
+       movdqa  %xmm15,%xmm10
+       xorps   %xmm11,%xmm3
+       xorps   %xmm12,%xmm4
+       movdqu  %xmm2,(%rsi)
+       xorps   %xmm13,%xmm5
+       movdqu  %xmm3,16(%rsi)
+       xorps   %xmm14,%xmm6
+       movdqu  %xmm4,32(%rsi)
+       movdqu  %xmm5,48(%rsi)
+       movdqu  %xmm6,64(%rsi)
+       leaq    80(%rsi),%rsi
+       jmp     .Lxts_enc_done
+
+.align 16
+.Lxts_enc_one:
+       movups  (%rdi),%xmm2
+       leaq    16(%rdi),%rdi
+       xorps   %xmm10,%xmm2
+       movaps  (%rcx),%xmm0
+       movaps  16(%rcx),%xmm1
+       leaq    32(%rcx),%rcx
+       xorps   %xmm0,%xmm2
+.Loop_enc1_9:
+.byte  102,15,56,220,209
+       decl    %eax
+       movaps  (%rcx),%xmm1
+       leaq    16(%rcx),%rcx
+       jnz     .Loop_enc1_9    
+.byte  102,15,56,221,209
+       xorps   %xmm10,%xmm2
+       movdqa  %xmm11,%xmm10
+       movups  %xmm2,(%rsi)
+       leaq    16(%rsi),%rsi
+       jmp     .Lxts_enc_done
+
+.align 16
+.Lxts_enc_two:
+       movups  (%rdi),%xmm2
+       movups  16(%rdi),%xmm3
+       leaq    32(%rdi),%rdi
+       xorps   %xmm10,%xmm2
+       xorps   %xmm11,%xmm3
+
+       call    _aesni_encrypt3
+
+       xorps   %xmm10,%xmm2
+       movdqa  %xmm12,%xmm10
+       xorps   %xmm11,%xmm3
+       movups  %xmm2,(%rsi)
+       movups  %xmm3,16(%rsi)
+       leaq    32(%rsi),%rsi
+       jmp     .Lxts_enc_done
+
+.align 16
+.Lxts_enc_three:
+       movups  (%rdi),%xmm2
+       movups  16(%rdi),%xmm3
+       movups  32(%rdi),%xmm4
+       leaq    48(%rdi),%rdi
+       xorps   %xmm10,%xmm2
+       xorps   %xmm11,%xmm3
+       xorps   %xmm12,%xmm4
+
+       call    _aesni_encrypt3
+
+       xorps   %xmm10,%xmm2
+       movdqa  %xmm13,%xmm10
+       xorps   %xmm11,%xmm3
+       xorps   %xmm12,%xmm4
+       movups  %xmm2,(%rsi)
+       movups  %xmm3,16(%rsi)
+       movups  %xmm4,32(%rsi)
+       leaq    48(%rsi),%rsi
+       jmp     .Lxts_enc_done
+
+.align 16
+.Lxts_enc_four:
+       movups  (%rdi),%xmm2
+       movups  16(%rdi),%xmm3
+       movups  32(%rdi),%xmm4
+       xorps   %xmm10,%xmm2
+       movups  48(%rdi),%xmm5
+       leaq    64(%rdi),%rdi
+       xorps   %xmm11,%xmm3
+       xorps   %xmm12,%xmm4
+       xorps   %xmm13,%xmm5
+
+       call    _aesni_encrypt4
+
+       xorps   %xmm10,%xmm2
+       movdqa  %xmm15,%xmm10
+       xorps   %xmm11,%xmm3
+       xorps   %xmm12,%xmm4
+       movups  %xmm2,(%rsi)
+       xorps   %xmm13,%xmm5
+       movups  %xmm3,16(%rsi)
+       movups  %xmm4,32(%rsi)
+       movups  %xmm5,48(%rsi)
+       leaq    64(%rsi),%rsi
+       jmp     .Lxts_enc_done
+
+.align 16
+.Lxts_enc_done:
+       andq    $15,%r9
+       jz      .Lxts_enc_ret
+       movq    %r9,%rdx
+
+.Lxts_enc_steal:
+       movzbl  (%rdi),%eax
+       movzbl  -16(%rsi),%ecx
+       leaq    1(%rdi),%rdi
+       movb    %al,-16(%rsi)
+       movb    %cl,0(%rsi)
+       leaq    1(%rsi),%rsi
+       subq    $1,%rdx
+       jnz     .Lxts_enc_steal
+
+       subq    %r9,%rsi
+       movq    %r11,%rcx
+       movl    %r10d,%eax
+
+       movups  -16(%rsi),%xmm2
+       xorps   %xmm10,%xmm2
+       movaps  (%rcx),%xmm0
+       movaps  16(%rcx),%xmm1
+       leaq    32(%rcx),%rcx
+       xorps   %xmm0,%xmm2
+.Loop_enc1_10:
+.byte  102,15,56,220,209
+       decl    %eax
+       movaps  (%rcx),%xmm1
+       leaq    16(%rcx),%rcx
+       jnz     .Loop_enc1_10   
+.byte  102,15,56,221,209
+       xorps   %xmm10,%xmm2
+       movups  %xmm2,-16(%rsi)
+
+.Lxts_enc_ret:
+       leaq    104(%rsp),%rsp
+.Lxts_enc_epilogue:
+       .byte   0xf3,0xc3
+.size  aesni_xts_encrypt,.-aesni_xts_encrypt
+.globl aesni_xts_decrypt
+.type  aesni_xts_decrypt,@function
+.align 16
+aesni_xts_decrypt:
+       leaq    -104(%rsp),%rsp
+       movups  (%r9),%xmm15
+       movl    240(%r8),%eax
+       movl    240(%rcx),%r10d
+       movaps  (%r8),%xmm0
+       movaps  16(%r8),%xmm1
+       leaq    32(%r8),%r8
+       xorps   %xmm0,%xmm15
+.Loop_enc1_11:
+.byte  102,68,15,56,220,249
+       decl    %eax
+       movaps  (%r8),%xmm1
+       leaq    16(%r8),%r8
+       jnz     .Loop_enc1_11   
+.byte  102,68,15,56,221,249
+       xorl    %eax,%eax
+       testq   $15,%rdx
+       setnz   %al
+       shlq    $4,%rax
+       subq    %rax,%rdx
+
+       movq    %rcx,%r11
+       movl    %r10d,%eax
+       movq    %rdx,%r9
+       andq    $-16,%rdx
+
+       movdqa  .Lxts_magic(%rip),%xmm8
+       pxor    %xmm14,%xmm14
+       pcmpgtd %xmm15,%xmm14
+       pshufd  $19,%xmm14,%xmm9
+       pxor    %xmm14,%xmm14
+       movdqa  %xmm15,%xmm10
+       paddq   %xmm15,%xmm15
+       pand    %xmm8,%xmm9
+       pcmpgtd %xmm15,%xmm14
+       pxor    %xmm9,%xmm15
+       pshufd  $19,%xmm14,%xmm9
+       pxor    %xmm14,%xmm14
+       movdqa  %xmm15,%xmm11
+       paddq   %xmm15,%xmm15
+       pand    %xmm8,%xmm9
+       pcmpgtd %xmm15,%xmm14
+       pxor    %xmm9,%xmm15
+       pshufd  $19,%xmm14,%xmm9
+       pxor    %xmm14,%xmm14
+       movdqa  %xmm15,%xmm12
+       paddq   %xmm15,%xmm15
+       pand    %xmm8,%xmm9
+       pcmpgtd %xmm15,%xmm14
+       pxor    %xmm9,%xmm15
+       pshufd  $19,%xmm14,%xmm9
+       pxor    %xmm14,%xmm14
+       movdqa  %xmm15,%xmm13
+       paddq   %xmm15,%xmm15
+       pand    %xmm8,%xmm9
+       pcmpgtd %xmm15,%xmm14
+       pxor    %xmm9,%xmm15
+       subq    $96,%rdx
+       jc      .Lxts_dec_short
+
+       shrl    $1,%eax
+       subl    $1,%eax
+       movl    %eax,%r10d
+       jmp     .Lxts_dec_grandloop
+
+.align 16
+.Lxts_dec_grandloop:
+       pshufd  $19,%xmm14,%xmm9
+       movdqa  %xmm15,%xmm14
+       paddq   %xmm15,%xmm15
+       movdqu  0(%rdi),%xmm2
+       pand    %xmm8,%xmm9
+       movdqu  16(%rdi),%xmm3
+       pxor    %xmm9,%xmm15
+
+       movdqu  32(%rdi),%xmm4
+       pxor    %xmm10,%xmm2
+       movdqu  48(%rdi),%xmm5
+       pxor    %xmm11,%xmm3
+       movdqu  64(%rdi),%xmm6
+       pxor    %xmm12,%xmm4
+       movdqu  80(%rdi),%xmm7
+       leaq    96(%rdi),%rdi
+       pxor    %xmm13,%xmm5
+       movaps  (%r11),%xmm0
+       pxor    %xmm14,%xmm6
+       pxor    %xmm15,%xmm7
+
+
+
+       movaps  16(%r11),%xmm1
+       pxor    %xmm0,%xmm2
+       pxor    %xmm0,%xmm3
+       movdqa  %xmm10,0(%rsp)
+.byte  102,15,56,222,209
+       leaq    32(%r11),%rcx
+       pxor    %xmm0,%xmm4
+       movdqa  %xmm11,16(%rsp)
+.byte  102,15,56,222,217
+       pxor    %xmm0,%xmm5
+       movdqa  %xmm12,32(%rsp)
+.byte  102,15,56,222,225
+       pxor    %xmm0,%xmm6
+       movdqa  %xmm13,48(%rsp)
+.byte  102,15,56,222,233
+       pxor    %xmm0,%xmm7
+       movaps  (%rcx),%xmm0
+       decl    %eax
+       movdqa  %xmm14,64(%rsp)
+.byte  102,15,56,222,241
+       movdqa  %xmm15,80(%rsp)
+.byte  102,15,56,222,249
+       pxor    %xmm14,%xmm14
+       pcmpgtd %xmm15,%xmm14
+       jmp     .Lxts_dec_loop6_enter
+
+.align 16
+.Lxts_dec_loop6:
+.byte  102,15,56,222,209
+.byte  102,15,56,222,217
+       decl    %eax
+.byte  102,15,56,222,225
+.byte  102,15,56,222,233
+.byte  102,15,56,222,241
+.byte  102,15,56,222,249
+.Lxts_dec_loop6_enter:
+       movaps  16(%rcx),%xmm1
+.byte  102,15,56,222,208
+.byte  102,15,56,222,216
+       leaq    32(%rcx),%rcx
+.byte  102,15,56,222,224
+.byte  102,15,56,222,232
+.byte  102,15,56,222,240
+.byte  102,15,56,222,248
+       movaps  (%rcx),%xmm0
+       jnz     .Lxts_dec_loop6
+
+       pshufd  $19,%xmm14,%xmm9
+       pxor    %xmm14,%xmm14
+       paddq   %xmm15,%xmm15
+.byte  102,15,56,222,209
+       pand    %xmm8,%xmm9
+.byte  102,15,56,222,217
+       pcmpgtd %xmm15,%xmm14
+.byte  102,15,56,222,225
+       pxor    %xmm9,%xmm15
+.byte  102,15,56,222,233
+.byte  102,15,56,222,241
+.byte  102,15,56,222,249
+       movaps  16(%rcx),%xmm1
+
+       pshufd  $19,%xmm14,%xmm9
+       pxor    %xmm14,%xmm14
+       movdqa  %xmm15,%xmm10
+       paddq   %xmm15,%xmm15
+.byte  102,15,56,222,208
+       pand    %xmm8,%xmm9
+.byte  102,15,56,222,216
+       pcmpgtd %xmm15,%xmm14
+.byte  102,15,56,222,224
+       pxor    %xmm9,%xmm15
+.byte  102,15,56,222,232
+.byte  102,15,56,222,240
+.byte  102,15,56,222,248
+       movaps  32(%rcx),%xmm0
+
+       pshufd  $19,%xmm14,%xmm9
+       pxor    %xmm14,%xmm14
+       movdqa  %xmm15,%xmm11
+       paddq   %xmm15,%xmm15
+.byte  102,15,56,222,209
+       pand    %xmm8,%xmm9
+.byte  102,15,56,222,217
+       pcmpgtd %xmm15,%xmm14
+.byte  102,15,56,222,225
+       pxor    %xmm9,%xmm15
+.byte  102,15,56,222,233
+.byte  102,15,56,222,241
+.byte  102,15,56,222,249
+
+       pshufd  $19,%xmm14,%xmm9
+       pxor    %xmm14,%xmm14
+       movdqa  %xmm15,%xmm12
+       paddq   %xmm15,%xmm15
+.byte  102,15,56,223,208
+       pand    %xmm8,%xmm9
+.byte  102,15,56,223,216
+       pcmpgtd %xmm15,%xmm14
+.byte  102,15,56,223,224
+       pxor    %xmm9,%xmm15
+.byte  102,15,56,223,232
+.byte  102,15,56,223,240
+.byte  102,15,56,223,248
+
+       pshufd  $19,%xmm14,%xmm9
+       pxor    %xmm14,%xmm14
+       movdqa  %xmm15,%xmm13
+       paddq   %xmm15,%xmm15
+       xorps   0(%rsp),%xmm2
+       pand    %xmm8,%xmm9
+       xorps   16(%rsp),%xmm3
+       pcmpgtd %xmm15,%xmm14
+       pxor    %xmm9,%xmm15
+
+       xorps   32(%rsp),%xmm4
+       movups  %xmm2,0(%rsi)
+       xorps   48(%rsp),%xmm5
+       movups  %xmm3,16(%rsi)
+       xorps   64(%rsp),%xmm6
+       movups  %xmm4,32(%rsi)
+       xorps   80(%rsp),%xmm7
+       movups  %xmm5,48(%rsi)
+       movl    %r10d,%eax
+       movups  %xmm6,64(%rsi)
+       movups  %xmm7,80(%rsi)
+       leaq    96(%rsi),%rsi
+       subq    $96,%rdx
+       jnc     .Lxts_dec_grandloop
+
+       leal    3(%rax,%rax,1),%eax
+       movq    %r11,%rcx
+       movl    %eax,%r10d
+
+.Lxts_dec_short:
+       addq    $96,%rdx
+       jz      .Lxts_dec_done
+
+       cmpq    $32,%rdx
+       jb      .Lxts_dec_one
+       je      .Lxts_dec_two
+
+       cmpq    $64,%rdx
+       jb      .Lxts_dec_three
+       je      .Lxts_dec_four
+
+       pshufd  $19,%xmm14,%xmm9
+       movdqa  %xmm15,%xmm14
+       paddq   %xmm15,%xmm15
+       movdqu  (%rdi),%xmm2
+       pand    %xmm8,%xmm9
+       movdqu  16(%rdi),%xmm3
+       pxor    %xmm9,%xmm15
+
+       movdqu  32(%rdi),%xmm4
+       pxor    %xmm10,%xmm2
+       movdqu  48(%rdi),%xmm5
+       pxor    %xmm11,%xmm3
+       movdqu  64(%rdi),%xmm6
+       leaq    80(%rdi),%rdi
+       pxor    %xmm12,%xmm4
+       pxor    %xmm13,%xmm5
+       pxor    %xmm14,%xmm6
+
+       call    _aesni_decrypt6
+
+       xorps   %xmm10,%xmm2
+       xorps   %xmm11,%xmm3
+       xorps   %xmm12,%xmm4
+       movdqu  %xmm2,(%rsi)
+       xorps   %xmm13,%xmm5
+       movdqu  %xmm3,16(%rsi)
+       xorps   %xmm14,%xmm6
+       movdqu  %xmm4,32(%rsi)
+       pxor    %xmm14,%xmm14
+       movdqu  %xmm5,48(%rsi)
+       pcmpgtd %xmm15,%xmm14
+       movdqu  %xmm6,64(%rsi)
+       leaq    80(%rsi),%rsi
+       pshufd  $19,%xmm14,%xmm11
+       andq    $15,%r9
+       jz      .Lxts_dec_ret
+
+       movdqa  %xmm15,%xmm10
+       paddq   %xmm15,%xmm15
+       pand    %xmm8,%xmm11
+       pxor    %xmm15,%xmm11
+       jmp     .Lxts_dec_done2
+
+.align 16
+.Lxts_dec_one:
+       movups  (%rdi),%xmm2
+       leaq    16(%rdi),%rdi
+       xorps   %xmm10,%xmm2
+       movaps  (%rcx),%xmm0
+       movaps  16(%rcx),%xmm1
+       leaq    32(%rcx),%rcx
+       xorps   %xmm0,%xmm2
+.Loop_dec1_12:
+.byte  102,15,56,222,209
+       decl    %eax
+       movaps  (%rcx),%xmm1
+       leaq    16(%rcx),%rcx
+       jnz     .Loop_dec1_12   
+.byte  102,15,56,223,209
+       xorps   %xmm10,%xmm2
+       movdqa  %xmm11,%xmm10
+       movups  %xmm2,(%rsi)
+       movdqa  %xmm12,%xmm11
+       leaq    16(%rsi),%rsi
+       jmp     .Lxts_dec_done
+
+.align 16
+.Lxts_dec_two:
+       movups  (%rdi),%xmm2
+       movups  16(%rdi),%xmm3
+       leaq    32(%rdi),%rdi
+       xorps   %xmm10,%xmm2
+       xorps   %xmm11,%xmm3
+
+       call    _aesni_decrypt3
+
+       xorps   %xmm10,%xmm2
+       movdqa  %xmm12,%xmm10
+       xorps   %xmm11,%xmm3
+       movdqa  %xmm13,%xmm11
+       movups  %xmm2,(%rsi)
+       movups  %xmm3,16(%rsi)
+       leaq    32(%rsi),%rsi
+       jmp     .Lxts_dec_done
+
+.align 16
+.Lxts_dec_three:
+       movups  (%rdi),%xmm2
+       movups  16(%rdi),%xmm3
+       movups  32(%rdi),%xmm4
+       leaq    48(%rdi),%rdi
+       xorps   %xmm10,%xmm2
+       xorps   %xmm11,%xmm3
+       xorps   %xmm12,%xmm4
+
+       call    _aesni_decrypt3
+
+       xorps   %xmm10,%xmm2
+       movdqa  %xmm13,%xmm10
+       xorps   %xmm11,%xmm3
+       movdqa  %xmm15,%xmm11
+       xorps   %xmm12,%xmm4
+       movups  %xmm2,(%rsi)
+       movups  %xmm3,16(%rsi)
+       movups  %xmm4,32(%rsi)
+       leaq    48(%rsi),%rsi
+       jmp     .Lxts_dec_done
+
+.align 16
+.Lxts_dec_four:
+       pshufd  $19,%xmm14,%xmm9
+       movdqa  %xmm15,%xmm14
+       paddq   %xmm15,%xmm15
+       movups  (%rdi),%xmm2
+       pand    %xmm8,%xmm9
+       movups  16(%rdi),%xmm3
+       pxor    %xmm9,%xmm15
+
+       movups  32(%rdi),%xmm4
+       xorps   %xmm10,%xmm2
+       movups  48(%rdi),%xmm5
+       leaq    64(%rdi),%rdi
+       xorps   %xmm11,%xmm3
+       xorps   %xmm12,%xmm4
+       xorps   %xmm13,%xmm5
+
+       call    _aesni_decrypt4
+
+       xorps   %xmm10,%xmm2
+       movdqa  %xmm14,%xmm10
+       xorps   %xmm11,%xmm3
+       movdqa  %xmm15,%xmm11
+       xorps   %xmm12,%xmm4
+       movups  %xmm2,(%rsi)
+       xorps   %xmm13,%xmm5
+       movups  %xmm3,16(%rsi)
+       movups  %xmm4,32(%rsi)
+       movups  %xmm5,48(%rsi)
+       leaq    64(%rsi),%rsi
+       jmp     .Lxts_dec_done
+
+.align 16
+.Lxts_dec_done:
+       andq    $15,%r9
+       jz      .Lxts_dec_ret
+.Lxts_dec_done2:
+       movq    %r9,%rdx
+       movq    %r11,%rcx
+       movl    %r10d,%eax
+
+       movups  (%rdi),%xmm2
+       xorps   %xmm11,%xmm2
+       movaps  (%rcx),%xmm0
+       movaps  16(%rcx),%xmm1
+       leaq    32(%rcx),%rcx
+       xorps   %xmm0,%xmm2
+.Loop_dec1_13:
+.byte  102,15,56,222,209
+       decl    %eax
+       movaps  (%rcx),%xmm1
+       leaq    16(%rcx),%rcx
+       jnz     .Loop_dec1_13   
+.byte  102,15,56,223,209
+       xorps   %xmm11,%xmm2
+       movups  %xmm2,(%rsi)
+
+.Lxts_dec_steal:
+       movzbl  16(%rdi),%eax
+       movzbl  (%rsi),%ecx
+       leaq    1(%rdi),%rdi
+       movb    %al,(%rsi)
+       movb    %cl,16(%rsi)
+       leaq    1(%rsi),%rsi
+       subq    $1,%rdx
+       jnz     .Lxts_dec_steal
+
+       subq    %r9,%rsi
+       movq    %r11,%rcx
+       movl    %r10d,%eax
+
+       movups  (%rsi),%xmm2
+       xorps   %xmm10,%xmm2
+       movaps  (%rcx),%xmm0
+       movaps  16(%rcx),%xmm1
+       leaq    32(%rcx),%rcx
+       xorps   %xmm0,%xmm2
+.Loop_dec1_14:
+.byte  102,15,56,222,209
+       decl    %eax
+       movaps  (%rcx),%xmm1
+       leaq    16(%rcx),%rcx
+       jnz     .Loop_dec1_14   
+.byte  102,15,56,223,209
+       xorps   %xmm10,%xmm2
+       movups  %xmm2,(%rsi)
+
+.Lxts_dec_ret:
+       leaq    104(%rsp),%rsp
+.Lxts_dec_epilogue:
+       .byte   0xf3,0xc3
+.size  aesni_xts_decrypt,.-aesni_xts_decrypt
 .globl aesni_cbc_encrypt
 .type  aesni_cbc_encrypt,@function
 .align 16
@@ -385,37 +2009,38 @@ aesni_cbc_encrypt:
        testl   %r9d,%r9d
        jz      .Lcbc_decrypt
 
-       movups  (%r8),%xmm0
-       cmpq    $16,%rdx
+       movups  (%r8),%xmm2
        movl    %r10d,%eax
+       cmpq    $16,%rdx
        jb      .Lcbc_enc_tail
        subq    $16,%rdx
        jmp     .Lcbc_enc_loop
 .align 16
 .Lcbc_enc_loop:
-       movups  (%rdi),%xmm1
+       movups  (%rdi),%xmm3
        leaq    16(%rdi),%rdi
-       pxor    %xmm1,%xmm0
-       movaps  (%rcx),%xmm4
-       movaps  16(%rcx),%xmm5
+
+       movaps  (%rcx),%xmm0
+       movaps  16(%rcx),%xmm1
+       xorps   %xmm0,%xmm3
        leaq    32(%rcx),%rcx
-       pxor    %xmm4,%xmm0
-.Loop_enc1_5:
-.byte  102,15,56,220,197
+       xorps   %xmm3,%xmm2
+.Loop_enc1_15:
+.byte  102,15,56,220,209
        decl    %eax
-       movaps  (%rcx),%xmm5
+       movaps  (%rcx),%xmm1
        leaq    16(%rcx),%rcx
-       jnz     .Loop_enc1_5    
-.byte  102,15,56,221,197
-       subq    $16,%rdx
-       leaq    16(%rsi),%rsi
+       jnz     .Loop_enc1_15   
+.byte  102,15,56,221,209
        movl    %r10d,%eax
        movq    %r11,%rcx
-       movups  %xmm0,-16(%rsi)
+       movups  %xmm2,0(%rsi)
+       leaq    16(%rsi),%rsi
+       subq    $16,%rdx
        jnc     .Lcbc_enc_loop
        addq    $16,%rdx
        jnz     .Lcbc_enc_tail
-       movups  %xmm0,(%r8)
+       movups  %xmm2,(%r8)
        jmp     .Lcbc_ret
 
 .Lcbc_enc_tail:
@@ -435,113 +2060,261 @@ aesni_cbc_encrypt:
 
 .align 16
 .Lcbc_decrypt:
-       movups  (%r8),%xmm6
-       subq    $64,%rdx
+       movups  (%r8),%xmm9
        movl    %r10d,%eax
+       cmpq    $112,%rdx
        jbe     .Lcbc_dec_tail
-       jmp     .Lcbc_dec_loop3
+       shrl    $1,%r10d
+       subq    $112,%rdx
+       movl    %r10d,%eax
+       movaps  %xmm9,-24(%rsp)
+       jmp     .Lcbc_dec_loop8_enter
 .align 16
-.Lcbc_dec_loop3:
-       movups  (%rdi),%xmm0
-       movups  16(%rdi),%xmm1
-       movups  32(%rdi),%xmm2
-       movaps  %xmm0,%xmm7
-       movaps  %xmm1,%xmm8
-       movaps  %xmm2,%xmm9
-       call    _aesni_decrypt3
-       subq    $48,%rdx
-       leaq    48(%rdi),%rdi
-       leaq    48(%rsi),%rsi
-       pxor    %xmm6,%xmm0
-       pxor    %xmm7,%xmm1
-       movaps  %xmm9,%xmm6
-       pxor    %xmm8,%xmm2
-       movups  %xmm0,-48(%rsi)
+.Lcbc_dec_loop8:
+       movaps  %xmm0,-24(%rsp)
+       movups  %xmm9,(%rsi)
+       leaq    16(%rsi),%rsi
+.Lcbc_dec_loop8_enter:
+       movaps  (%rcx),%xmm0
+       movups  (%rdi),%xmm2
+       movups  16(%rdi),%xmm3
+       movaps  16(%rcx),%xmm1
+
+       leaq    32(%rcx),%rcx
+       movdqu  32(%rdi),%xmm4
+       xorps   %xmm0,%xmm2
+       movdqu  48(%rdi),%xmm5
+       xorps   %xmm0,%xmm3
+       movdqu  64(%rdi),%xmm6
+.byte  102,15,56,222,209
+       pxor    %xmm0,%xmm4
+       movdqu  80(%rdi),%xmm7
+.byte  102,15,56,222,217
+       pxor    %xmm0,%xmm5
+       movdqu  96(%rdi),%xmm8
+.byte  102,15,56,222,225
+       pxor    %xmm0,%xmm6
+       movdqu  112(%rdi),%xmm9
+.byte  102,15,56,222,233
+       pxor    %xmm0,%xmm7
+       decl    %eax
+.byte  102,15,56,222,241
+       pxor    %xmm0,%xmm8
+.byte  102,15,56,222,249
+       pxor    %xmm0,%xmm9
+       movaps  (%rcx),%xmm0
+.byte  102,68,15,56,222,193
+.byte  102,68,15,56,222,201
+       movaps  16(%rcx),%xmm1
+
+       call    .Ldec_loop8_enter
+
+       movups  (%rdi),%xmm1
+       movups  16(%rdi),%xmm0
+       xorps   -24(%rsp),%xmm2
+       xorps   %xmm1,%xmm3
+       movups  32(%rdi),%xmm1
+       xorps   %xmm0,%xmm4
+       movups  48(%rdi),%xmm0
+       xorps   %xmm1,%xmm5
+       movups  64(%rdi),%xmm1
+       xorps   %xmm0,%xmm6
+       movups  80(%rdi),%xmm0
+       xorps   %xmm1,%xmm7
+       movups  96(%rdi),%xmm1
+       xorps   %xmm0,%xmm8
+       movups  112(%rdi),%xmm0
+       xorps   %xmm1,%xmm9
+       movups  %xmm2,(%rsi)
+       movups  %xmm3,16(%rsi)
+       movups  %xmm4,32(%rsi)
+       movups  %xmm5,48(%rsi)
        movl    %r10d,%eax
-       movups  %xmm1,-32(%rsi)
+       movups  %xmm6,64(%rsi)
        movq    %r11,%rcx
-       movups  %xmm2,-16(%rsi)
-       ja      .Lcbc_dec_loop3
+       movups  %xmm7,80(%rsi)
+       leaq    128(%rdi),%rdi
+       movups  %xmm8,96(%rsi)
+       leaq    112(%rsi),%rsi
+       subq    $128,%rdx
+       ja      .Lcbc_dec_loop8
 
+       movaps  %xmm9,%xmm2
+       movaps  %xmm0,%xmm9
+       addq    $112,%rdx
+       jle     .Lcbc_dec_tail_collected
+       movups  %xmm2,(%rsi)
+       leal    1(%r10,%r10,1),%eax
+       leaq    16(%rsi),%rsi
 .Lcbc_dec_tail:
-       addq    $64,%rdx
-       movups  %xmm6,(%r8)
-       jz      .Lcbc_dec_ret
-
-       movups  (%rdi),%xmm0
+       movups  (%rdi),%xmm2
+       movaps  %xmm2,%xmm8
        cmpq    $16,%rdx
-       movaps  %xmm0,%xmm7
        jbe     .Lcbc_dec_one
-       movups  16(%rdi),%xmm1
+
+       movups  16(%rdi),%xmm3
+       movaps  %xmm3,%xmm7
        cmpq    $32,%rdx
-       movaps  %xmm1,%xmm8
        jbe     .Lcbc_dec_two
-       movups  32(%rdi),%xmm2
+
+       movups  32(%rdi),%xmm4
+       movaps  %xmm4,%xmm6
        cmpq    $48,%rdx
-       movaps  %xmm2,%xmm9
        jbe     .Lcbc_dec_three
-       movups  48(%rdi),%xmm3
-       call    _aesni_decrypt4
-       pxor    %xmm6,%xmm0
-       movups  48(%rdi),%xmm6
-       pxor    %xmm7,%xmm1
-       movups  %xmm0,(%rsi)
-       pxor    %xmm8,%xmm2
-       movups  %xmm1,16(%rsi)
-       pxor    %xmm9,%xmm3
-       movups  %xmm2,32(%rsi)
-       movaps  %xmm3,%xmm0
-       leaq    48(%rsi),%rsi
+
+       movups  48(%rdi),%xmm5
+       cmpq    $64,%rdx
+       jbe     .Lcbc_dec_four
+
+       movups  64(%rdi),%xmm6
+       cmpq    $80,%rdx
+       jbe     .Lcbc_dec_five
+
+       movups  80(%rdi),%xmm7
+       cmpq    $96,%rdx
+       jbe     .Lcbc_dec_six
+
+       movups  96(%rdi),%xmm8
+       movaps  %xmm9,-24(%rsp)
+       call    _aesni_decrypt8
+       movups  (%rdi),%xmm1
+       movups  16(%rdi),%xmm0
+       xorps   -24(%rsp),%xmm2
+       xorps   %xmm1,%xmm3
+       movups  32(%rdi),%xmm1
+       xorps   %xmm0,%xmm4
+       movups  48(%rdi),%xmm0
+       xorps   %xmm1,%xmm5
+       movups  64(%rdi),%xmm1
+       xorps   %xmm0,%xmm6
+       movups  80(%rdi),%xmm0
+       xorps   %xmm1,%xmm7
+       movups  96(%rdi),%xmm9
+       xorps   %xmm0,%xmm8
+       movups  %xmm2,(%rsi)
+       movups  %xmm3,16(%rsi)
+       movups  %xmm4,32(%rsi)
+       movups  %xmm5,48(%rsi)
+       movups  %xmm6,64(%rsi)
+       movups  %xmm7,80(%rsi)
+       leaq    96(%rsi),%rsi
+       movaps  %xmm8,%xmm2
+       subq    $112,%rdx
        jmp     .Lcbc_dec_tail_collected
 .align 16
 .Lcbc_dec_one:
-       movaps  (%rcx),%xmm4
-       movaps  16(%rcx),%xmm5
+       movaps  (%rcx),%xmm0
+       movaps  16(%rcx),%xmm1
        leaq    32(%rcx),%rcx
-       pxor    %xmm4,%xmm0
-.Loop_dec1_6:
-.byte  102,15,56,222,197
+       xorps   %xmm0,%xmm2
+.Loop_dec1_16:
+.byte  102,15,56,222,209
        decl    %eax
-       movaps  (%rcx),%xmm5
+       movaps  (%rcx),%xmm1
        leaq    16(%rcx),%rcx
-       jnz     .Loop_dec1_6    
-.byte  102,15,56,223,197
-       pxor    %xmm6,%xmm0
-       movaps  %xmm7,%xmm6
+       jnz     .Loop_dec1_16   
+.byte  102,15,56,223,209
+       xorps   %xmm9,%xmm2
+       movaps  %xmm8,%xmm9
+       subq    $16,%rdx
        jmp     .Lcbc_dec_tail_collected
 .align 16
 .Lcbc_dec_two:
+       xorps   %xmm4,%xmm4
        call    _aesni_decrypt3
-       pxor    %xmm6,%xmm0
-       pxor    %xmm7,%xmm1
-       movups  %xmm0,(%rsi)
-       movaps  %xmm8,%xmm6
-       movaps  %xmm1,%xmm0
+       xorps   %xmm9,%xmm2
+       xorps   %xmm8,%xmm3
+       movups  %xmm2,(%rsi)
+       movaps  %xmm7,%xmm9
+       movaps  %xmm3,%xmm2
        leaq    16(%rsi),%rsi
+       subq    $32,%rdx
        jmp     .Lcbc_dec_tail_collected
 .align 16
 .Lcbc_dec_three:
        call    _aesni_decrypt3
-       pxor    %xmm6,%xmm0
-       pxor    %xmm7,%xmm1
-       movups  %xmm0,(%rsi)
-       pxor    %xmm8,%xmm2
-       movups  %xmm1,16(%rsi)
-       movaps  %xmm9,%xmm6
-       movaps  %xmm2,%xmm0
+       xorps   %xmm9,%xmm2
+       xorps   %xmm8,%xmm3
+       movups  %xmm2,(%rsi)
+       xorps   %xmm7,%xmm4
+       movups  %xmm3,16(%rsi)
+       movaps  %xmm6,%xmm9
+       movaps  %xmm4,%xmm2
        leaq    32(%rsi),%rsi
+       subq    $48,%rdx
+       jmp     .Lcbc_dec_tail_collected
+.align 16
+.Lcbc_dec_four:
+       call    _aesni_decrypt4
+       xorps   %xmm9,%xmm2
+       movups  48(%rdi),%xmm9
+       xorps   %xmm8,%xmm3
+       movups  %xmm2,(%rsi)
+       xorps   %xmm7,%xmm4
+       movups  %xmm3,16(%rsi)
+       xorps   %xmm6,%xmm5
+       movups  %xmm4,32(%rsi)
+       movaps  %xmm5,%xmm2
+       leaq    48(%rsi),%rsi
+       subq    $64,%rdx
+       jmp     .Lcbc_dec_tail_collected
+.align 16
+.Lcbc_dec_five:
+       xorps   %xmm7,%xmm7
+       call    _aesni_decrypt6
+       movups  16(%rdi),%xmm1
+       movups  32(%rdi),%xmm0
+       xorps   %xmm9,%xmm2
+       xorps   %xmm8,%xmm3
+       xorps   %xmm1,%xmm4
+       movups  48(%rdi),%xmm1
+       xorps   %xmm0,%xmm5
+       movups  64(%rdi),%xmm9
+       xorps   %xmm1,%xmm6
+       movups  %xmm2,(%rsi)
+       movups  %xmm3,16(%rsi)
+       movups  %xmm4,32(%rsi)
+       movups  %xmm5,48(%rsi)
+       leaq    64(%rsi),%rsi
+       movaps  %xmm6,%xmm2
+       subq    $80,%rdx
+       jmp     .Lcbc_dec_tail_collected
+.align 16
+.Lcbc_dec_six:
+       call    _aesni_decrypt6
+       movups  16(%rdi),%xmm1
+       movups  32(%rdi),%xmm0
+       xorps   %xmm9,%xmm2
+       xorps   %xmm8,%xmm3
+       xorps   %xmm1,%xmm4
+       movups  48(%rdi),%xmm1
+       xorps   %xmm0,%xmm5
+       movups  64(%rdi),%xmm0
+       xorps   %xmm1,%xmm6
+       movups  80(%rdi),%xmm9
+       xorps   %xmm0,%xmm7
+       movups  %xmm2,(%rsi)
+       movups  %xmm3,16(%rsi)
+       movups  %xmm4,32(%rsi)
+       movups  %xmm5,48(%rsi)
+       movups  %xmm6,64(%rsi)
+       leaq    80(%rsi),%rsi
+       movaps  %xmm7,%xmm2
+       subq    $96,%rdx
        jmp     .Lcbc_dec_tail_collected
 .align 16
 .Lcbc_dec_tail_collected:
        andq    $15,%rdx
-       movups  %xmm6,(%r8)
+       movups  %xmm9,(%r8)
        jnz     .Lcbc_dec_tail_partial
-       movups  %xmm0,(%rsi)
+       movups  %xmm2,(%rsi)
        jmp     .Lcbc_dec_ret
+.align 16
 .Lcbc_dec_tail_partial:
-       movaps  %xmm0,-24(%rsp)
+       movaps  %xmm2,-24(%rsp)
+       movq    $16,%rcx
        movq    %rsi,%rdi
-       movq    %rdx,%rcx
+       subq    %rdx,%rcx
        leaq    -24(%rsp),%rsi
 .long  0x9066A4F3      
 
@@ -554,7 +2327,7 @@ aesni_cbc_encrypt:
 .align 16
 aesni_set_decrypt_key:
 .byte  0x48,0x83,0xEC,0x08     
-       call    _aesni_set_encrypt_key
+       call    __aesni_set_encrypt_key
        shll    $4,%esi
        testl   %eax,%eax
        jnz     .Ldec_key_ret
@@ -574,9 +2347,9 @@ aesni_set_decrypt_key:
 .byte  102,15,56,219,201
        leaq    16(%rdx),%rdx
        leaq    -16(%rdi),%rdi
-       cmpq    %rdx,%rdi
        movaps  %xmm0,16(%rdi)
        movaps  %xmm1,-16(%rdx)
+       cmpq    %rdx,%rdi
        ja      .Ldec_key_inverse
 
        movaps  (%rdx),%xmm0
@@ -591,16 +2364,16 @@ aesni_set_decrypt_key:
 .type  aesni_set_encrypt_key,@function
 .align 16
 aesni_set_encrypt_key:
-_aesni_set_encrypt_key:
+__aesni_set_encrypt_key:
 .byte  0x48,0x83,0xEC,0x08     
-       testq   %rdi,%rdi
        movq    $-1,%rax
+       testq   %rdi,%rdi
        jz      .Lenc_key_ret
        testq   %rdx,%rdx
        jz      .Lenc_key_ret
 
        movups  (%rdi),%xmm0
-       pxor    %xmm4,%xmm4
+       xorps   %xmm4,%xmm4
        leaq    16(%rdx),%rax
        cmpl    $256,%esi
        je      .L14rounds
@@ -715,11 +2488,11 @@ _aesni_set_encrypt_key:
        leaq    16(%rax),%rax
 .Lkey_expansion_128_cold:
        shufps  $16,%xmm0,%xmm4
-       pxor    %xmm4,%xmm0
+       xorps   %xmm4,%xmm0
        shufps  $140,%xmm0,%xmm4
-       pxor    %xmm4,%xmm0
-       pshufd  $255,%xmm1,%xmm1
-       pxor    %xmm1,%xmm0
+       xorps   %xmm4,%xmm0
+       shufps  $255,%xmm1,%xmm1
+       xorps   %xmm1,%xmm0
        .byte   0xf3,0xc3
 
 .align 16
@@ -730,11 +2503,11 @@ _aesni_set_encrypt_key:
        movaps  %xmm2,%xmm5
 .Lkey_expansion_192b_warm:
        shufps  $16,%xmm0,%xmm4
-       movaps  %xmm2,%xmm3
-       pxor    %xmm4,%xmm0
+       movdqa  %xmm2,%xmm3
+       xorps   %xmm4,%xmm0
        shufps  $140,%xmm0,%xmm4
        pslldq  $4,%xmm3
-       pxor    %xmm4,%xmm0
+       xorps   %xmm4,%xmm0
        pshufd  $85,%xmm1,%xmm1
        pxor    %xmm3,%xmm2
        pxor    %xmm1,%xmm0
@@ -758,11 +2531,11 @@ _aesni_set_encrypt_key:
        leaq    16(%rax),%rax
 .Lkey_expansion_256a_cold:
        shufps  $16,%xmm0,%xmm4
-       pxor    %xmm4,%xmm0
+       xorps   %xmm4,%xmm0
        shufps  $140,%xmm0,%xmm4
-       pxor    %xmm4,%xmm0
-       pshufd  $255,%xmm1,%xmm1
-       pxor    %xmm1,%xmm0
+       xorps   %xmm4,%xmm0
+       shufps  $255,%xmm1,%xmm1
+       xorps   %xmm1,%xmm0
        .byte   0xf3,0xc3
 
 .align 16
@@ -771,12 +2544,23 @@ _aesni_set_encrypt_key:
        leaq    16(%rax),%rax
 
        shufps  $16,%xmm2,%xmm4
-       pxor    %xmm4,%xmm2
+       xorps   %xmm4,%xmm2
        shufps  $140,%xmm2,%xmm4
-       pxor    %xmm4,%xmm2
-       pshufd  $170,%xmm1,%xmm1
-       pxor    %xmm1,%xmm2
+       xorps   %xmm4,%xmm2
+       shufps  $170,%xmm1,%xmm1
+       xorps   %xmm1,%xmm2
        .byte   0xf3,0xc3
 .size  aesni_set_encrypt_key,.-aesni_set_encrypt_key
+.size  __aesni_set_encrypt_key,.-__aesni_set_encrypt_key
+.align 64
+.Lbswap_mask:
+.byte  15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0
+.Lincrement32:
+.long  6,6,6,0
+.Lincrement64:
+.long  1,0,0,0
+.Lxts_magic:
+.long  0x87,0,1,0
+
 .byte  
65,69,83,32,102,111,114,32,73,110,116,101,108,32,65,69,83,45,78,73,44,32,67,82,89,80,84,79,71,65,77,83,32,98,121,32,60,97,112,112,114,111,64,111,112,101,110,115,115,108,46,111,114,103,62,0
 .align 64
diff --git a/lib/accelerated/intel/asm/appro-aes-x86.s 
b/lib/accelerated/intel/asm/appro-aes-x86.s
index 981e356..88e76ae 100644
--- a/lib/accelerated/intel/asm/appro-aes-x86.s
+++ b/lib/accelerated/intel/asm/appro-aes-x86.s
@@ -5,18 +5,19 @@
 # modification, are permitted provided that the following conditions
 # are met:
 # 
-#     *        Redistributions of source code must retain copyright notices,
-#      this list of conditions and the following disclaimer.
+#     *        Redistributions of source code must retain copyright
+#     * notices,
+#      this list of conditions and the following disclaimer.
 #
-#     *        Redistributions in binary form must reproduce the above
-#      copyright notice, this list of conditions and the following
-#      disclaimer in the documentation and/or other materials
-#      provided with the distribution.
+#     *        Redistributions in binary form must reproduce the above
+#      copyright notice, this list of conditions and the following
+#      disclaimer in the documentation and/or other materials
+#      provided with the distribution.
 #
-#     *        Neither the name of the Andy Polyakov nor the names of its
-#      copyright holder and contributors may be used to endorse or
-#      promote products derived from this software without specific
-#      prior written permission.
+#     *        Neither the name of the Andy Polyakov nor the names of its
+#      copyright holder and contributors may be used to endorse or
+#      promote products derived from this software without specific
+#      prior written permission.
 #
 # ALTERNATIVELY, provided that this notice is retained in full, this
 # product may be distributed under the terms of the GNU General Public
@@ -44,21 +45,21 @@ aesni_encrypt:
 .L_aesni_encrypt_begin:
        movl    4(%esp),%eax
        movl    12(%esp),%edx
-       movups  (%eax),%xmm0
+       movups  (%eax),%xmm2
        movl    240(%edx),%ecx
        movl    8(%esp),%eax
-       movups  (%edx),%xmm3
-       movups  16(%edx),%xmm4
+       movaps  (%edx),%xmm0
+       movaps  16(%edx),%xmm1
        leal    32(%edx),%edx
-       pxor    %xmm3,%xmm0
-.L000enc1_loop:
-       aesenc  %xmm4,%xmm0
+       xorps   %xmm0,%xmm2
+.L000enc1_loop_1:
+.byte  102,15,56,220,209
        decl    %ecx
-       movups  (%edx),%xmm4
+       movaps  (%edx),%xmm1
        leal    16(%edx),%edx
-       jnz     .L000enc1_loop
-       aesenclast      %xmm4,%xmm0
-       movups  %xmm0,(%eax)
+       jnz     .L000enc1_loop_1
+.byte  102,15,56,221,209
+       movups  %xmm2,(%eax)
        ret
 .size  aesni_encrypt,.-.L_aesni_encrypt_begin
 .globl aesni_decrypt
@@ -68,165 +69,271 @@ aesni_decrypt:
 .L_aesni_decrypt_begin:
        movl    4(%esp),%eax
        movl    12(%esp),%edx
-       movups  (%eax),%xmm0
+       movups  (%eax),%xmm2
        movl    240(%edx),%ecx
        movl    8(%esp),%eax
-       movups  (%edx),%xmm3
-       movups  16(%edx),%xmm4
+       movaps  (%edx),%xmm0
+       movaps  16(%edx),%xmm1
        leal    32(%edx),%edx
-       pxor    %xmm3,%xmm0
-.L001dec1_loop:
-       aesdec  %xmm4,%xmm0
+       xorps   %xmm0,%xmm2
+.L001dec1_loop_2:
+.byte  102,15,56,222,209
        decl    %ecx
-       movups  (%edx),%xmm4
+       movaps  (%edx),%xmm1
        leal    16(%edx),%edx
-       jnz     .L001dec1_loop
-       aesdeclast      %xmm4,%xmm0
-       movups  %xmm0,(%eax)
+       jnz     .L001dec1_loop_2
+.byte  102,15,56,223,209
+       movups  %xmm2,(%eax)
        ret
 .size  aesni_decrypt,.-.L_aesni_decrypt_begin
 .type  _aesni_encrypt3,@function
 .align 16
 _aesni_encrypt3:
-       movups  (%edx),%xmm3
+       movaps  (%edx),%xmm0
        shrl    $1,%ecx
-       movups  16(%edx),%xmm4
+       movaps  16(%edx),%xmm1
        leal    32(%edx),%edx
-       pxor    %xmm3,%xmm0
-       pxor    %xmm3,%xmm1
-       pxor    %xmm3,%xmm2
-       jmp     .L002enc3_loop
-.align 16
+       xorps   %xmm0,%xmm2
+       pxor    %xmm0,%xmm3
+       pxor    %xmm0,%xmm4
+       movaps  (%edx),%xmm0
 .L002enc3_loop:
-       aesenc  %xmm4,%xmm0
-       movups  (%edx),%xmm3
-       aesenc  %xmm4,%xmm1
+.byte  102,15,56,220,209
+.byte  102,15,56,220,217
        decl    %ecx
-       aesenc  %xmm4,%xmm2
-       movups  16(%edx),%xmm4
-       aesenc  %xmm3,%xmm0
+.byte  102,15,56,220,225
+       movaps  16(%edx),%xmm1
+.byte  102,15,56,220,208
+.byte  102,15,56,220,216
        leal    32(%edx),%edx
-       aesenc  %xmm3,%xmm1
-       aesenc  %xmm3,%xmm2
+.byte  102,15,56,220,224
+       movaps  (%edx),%xmm0
        jnz     .L002enc3_loop
-       aesenc  %xmm4,%xmm0
-       movups  (%edx),%xmm3
-       aesenc  %xmm4,%xmm1
-       aesenc  %xmm4,%xmm2
-       aesenclast      %xmm3,%xmm0
-       aesenclast      %xmm3,%xmm1
-       aesenclast      %xmm3,%xmm2
+.byte  102,15,56,220,209
+.byte  102,15,56,220,217
+.byte  102,15,56,220,225
+.byte  102,15,56,221,208
+.byte  102,15,56,221,216
+.byte  102,15,56,221,224
        ret
 .size  _aesni_encrypt3,.-_aesni_encrypt3
 .type  _aesni_decrypt3,@function
 .align 16
 _aesni_decrypt3:
-       movups  (%edx),%xmm3
+       movaps  (%edx),%xmm0
        shrl    $1,%ecx
-       movups  16(%edx),%xmm4
+       movaps  16(%edx),%xmm1
        leal    32(%edx),%edx
-       pxor    %xmm3,%xmm0
-       pxor    %xmm3,%xmm1
-       pxor    %xmm3,%xmm2
-       jmp     .L003dec3_loop
-.align 16
+       xorps   %xmm0,%xmm2
+       pxor    %xmm0,%xmm3
+       pxor    %xmm0,%xmm4
+       movaps  (%edx),%xmm0
 .L003dec3_loop:
-       aesdec  %xmm4,%xmm0
-       movups  (%edx),%xmm3
-       aesdec  %xmm4,%xmm1
+.byte  102,15,56,222,209
+.byte  102,15,56,222,217
        decl    %ecx
-       aesdec  %xmm4,%xmm2
-       movups  16(%edx),%xmm4
-       aesdec  %xmm3,%xmm0
+.byte  102,15,56,222,225
+       movaps  16(%edx),%xmm1
+.byte  102,15,56,222,208
+.byte  102,15,56,222,216
        leal    32(%edx),%edx
-       aesdec  %xmm3,%xmm1
-       aesdec  %xmm3,%xmm2
+.byte  102,15,56,222,224
+       movaps  (%edx),%xmm0
        jnz     .L003dec3_loop
-       aesdec  %xmm4,%xmm0
-       movups  (%edx),%xmm3
-       aesdec  %xmm4,%xmm1
-       aesdec  %xmm4,%xmm2
-       aesdeclast      %xmm3,%xmm0
-       aesdeclast      %xmm3,%xmm1
-       aesdeclast      %xmm3,%xmm2
+.byte  102,15,56,222,209
+.byte  102,15,56,222,217
+.byte  102,15,56,222,225
+.byte  102,15,56,223,208
+.byte  102,15,56,223,216
+.byte  102,15,56,223,224
        ret
 .size  _aesni_decrypt3,.-_aesni_decrypt3
 .type  _aesni_encrypt4,@function
 .align 16
 _aesni_encrypt4:
-       movups  (%edx),%xmm3
-       movups  16(%edx),%xmm4
+       movaps  (%edx),%xmm0
+       movaps  16(%edx),%xmm1
        shrl    $1,%ecx
        leal    32(%edx),%edx
-       pxor    %xmm3,%xmm0
-       pxor    %xmm3,%xmm1
-       pxor    %xmm3,%xmm2
-       pxor    %xmm3,%xmm7
-       jmp     .L004enc3_loop
-.align 16
-.L004enc3_loop:
-       aesenc  %xmm4,%xmm0
-       movups  (%edx),%xmm3
-       aesenc  %xmm4,%xmm1
+       xorps   %xmm0,%xmm2
+       pxor    %xmm0,%xmm3
+       pxor    %xmm0,%xmm4
+       pxor    %xmm0,%xmm5
+       movaps  (%edx),%xmm0
+.L004enc4_loop:
+.byte  102,15,56,220,209
+.byte  102,15,56,220,217
        decl    %ecx
-       aesenc  %xmm4,%xmm2
-       aesenc  %xmm4,%xmm7
-       movups  16(%edx),%xmm4
-       aesenc  %xmm3,%xmm0
-       leal    32(%edx),%edx
-       aesenc  %xmm3,%xmm1
-       aesenc  %xmm3,%xmm2
-       aesenc  %xmm3,%xmm7
-       jnz     .L004enc3_loop
-       aesenc  %xmm4,%xmm0
-       movups  (%edx),%xmm3
-       aesenc  %xmm4,%xmm1
-       aesenc  %xmm4,%xmm2
-       aesenc  %xmm4,%xmm7
-       aesenclast      %xmm3,%xmm0
-       aesenclast      %xmm3,%xmm1
-       aesenclast      %xmm3,%xmm2
-       aesenclast      %xmm3,%xmm7
+.byte  102,15,56,220,225
+.byte  102,15,56,220,233
+       movaps  16(%edx),%xmm1
+.byte  102,15,56,220,208
+.byte  102,15,56,220,216
+       leal    32(%edx),%edx
+.byte  102,15,56,220,224
+.byte  102,15,56,220,232
+       movaps  (%edx),%xmm0
+       jnz     .L004enc4_loop
+.byte  102,15,56,220,209
+.byte  102,15,56,220,217
+.byte  102,15,56,220,225
+.byte  102,15,56,220,233
+.byte  102,15,56,221,208
+.byte  102,15,56,221,216
+.byte  102,15,56,221,224
+.byte  102,15,56,221,232
        ret
 .size  _aesni_encrypt4,.-_aesni_encrypt4
 .type  _aesni_decrypt4,@function
 .align 16
 _aesni_decrypt4:
-       movups  (%edx),%xmm3
-       movups  16(%edx),%xmm4
+       movaps  (%edx),%xmm0
+       movaps  16(%edx),%xmm1
        shrl    $1,%ecx
        leal    32(%edx),%edx
-       pxor    %xmm3,%xmm0
-       pxor    %xmm3,%xmm1
-       pxor    %xmm3,%xmm2
-       pxor    %xmm3,%xmm7
-       jmp     .L005dec3_loop
-.align 16
-.L005dec3_loop:
-       aesdec  %xmm4,%xmm0
-       movups  (%edx),%xmm3
-       aesdec  %xmm4,%xmm1
+       xorps   %xmm0,%xmm2
+       pxor    %xmm0,%xmm3
+       pxor    %xmm0,%xmm4
+       pxor    %xmm0,%xmm5
+       movaps  (%edx),%xmm0
+.L005dec4_loop:
+.byte  102,15,56,222,209
+.byte  102,15,56,222,217
        decl    %ecx
-       aesdec  %xmm4,%xmm2
-       aesdec  %xmm4,%xmm7
-       movups  16(%edx),%xmm4
-       aesdec  %xmm3,%xmm0
-       leal    32(%edx),%edx
-       aesdec  %xmm3,%xmm1
-       aesdec  %xmm3,%xmm2
-       aesdec  %xmm3,%xmm7
-       jnz     .L005dec3_loop
-       aesdec  %xmm4,%xmm0
-       movups  (%edx),%xmm3
-       aesdec  %xmm4,%xmm1
-       aesdec  %xmm4,%xmm2
-       aesdec  %xmm4,%xmm7
-       aesdeclast      %xmm3,%xmm0
-       aesdeclast      %xmm3,%xmm1
-       aesdeclast      %xmm3,%xmm2
-       aesdeclast      %xmm3,%xmm7
+.byte  102,15,56,222,225
+.byte  102,15,56,222,233
+       movaps  16(%edx),%xmm1
+.byte  102,15,56,222,208
+.byte  102,15,56,222,216
+       leal    32(%edx),%edx
+.byte  102,15,56,222,224
+.byte  102,15,56,222,232
+       movaps  (%edx),%xmm0
+       jnz     .L005dec4_loop
+.byte  102,15,56,222,209
+.byte  102,15,56,222,217
+.byte  102,15,56,222,225
+.byte  102,15,56,222,233
+.byte  102,15,56,223,208
+.byte  102,15,56,223,216
+.byte  102,15,56,223,224
+.byte  102,15,56,223,232
        ret
 .size  _aesni_decrypt4,.-_aesni_decrypt4
+.type  _aesni_encrypt6,@function
+.align 16
+_aesni_encrypt6:
+       movaps  (%edx),%xmm0
+       shrl    $1,%ecx
+       movaps  16(%edx),%xmm1
+       leal    32(%edx),%edx
+       xorps   %xmm0,%xmm2
+       pxor    %xmm0,%xmm3
+.byte  102,15,56,220,209
+       pxor    %xmm0,%xmm4
+.byte  102,15,56,220,217
+       pxor    %xmm0,%xmm5
+       decl    %ecx
+.byte  102,15,56,220,225
+       pxor    %xmm0,%xmm6
+.byte  102,15,56,220,233
+       pxor    %xmm0,%xmm7
+.byte  102,15,56,220,241
+       movaps  (%edx),%xmm0
+.byte  102,15,56,220,249
+       jmp     .L_aesni_encrypt6_enter
+.align 16
+.L006enc6_loop:
+.byte  102,15,56,220,209
+.byte  102,15,56,220,217
+       decl    %ecx
+.byte  102,15,56,220,225
+.byte  102,15,56,220,233
+.byte  102,15,56,220,241
+.byte  102,15,56,220,249
+.align 16
+.L_aesni_encrypt6_enter:
+       movaps  16(%edx),%xmm1
+.byte  102,15,56,220,208
+.byte  102,15,56,220,216
+       leal    32(%edx),%edx
+.byte  102,15,56,220,224
+.byte  102,15,56,220,232
+.byte  102,15,56,220,240
+.byte  102,15,56,220,248
+       movaps  (%edx),%xmm0
+       jnz     .L006enc6_loop
+.byte  102,15,56,220,209
+.byte  102,15,56,220,217
+.byte  102,15,56,220,225
+.byte  102,15,56,220,233
+.byte  102,15,56,220,241
+.byte  102,15,56,220,249
+.byte  102,15,56,221,208
+.byte  102,15,56,221,216
+.byte  102,15,56,221,224
+.byte  102,15,56,221,232
+.byte  102,15,56,221,240
+.byte  102,15,56,221,248
+       ret
+.size  _aesni_encrypt6,.-_aesni_encrypt6
+.type  _aesni_decrypt6,@function
+.align 16
+_aesni_decrypt6:
+       movaps  (%edx),%xmm0
+       shrl    $1,%ecx
+       movaps  16(%edx),%xmm1
+       leal    32(%edx),%edx
+       xorps   %xmm0,%xmm2
+       pxor    %xmm0,%xmm3
+.byte  102,15,56,222,209
+       pxor    %xmm0,%xmm4
+.byte  102,15,56,222,217
+       pxor    %xmm0,%xmm5
+       decl    %ecx
+.byte  102,15,56,222,225
+       pxor    %xmm0,%xmm6
+.byte  102,15,56,222,233
+       pxor    %xmm0,%xmm7
+.byte  102,15,56,222,241
+       movaps  (%edx),%xmm0
+.byte  102,15,56,222,249
+       jmp     .L_aesni_decrypt6_enter
+.align 16
+.L007dec6_loop:
+.byte  102,15,56,222,209
+.byte  102,15,56,222,217
+       decl    %ecx
+.byte  102,15,56,222,225
+.byte  102,15,56,222,233
+.byte  102,15,56,222,241
+.byte  102,15,56,222,249
+.align 16
+.L_aesni_decrypt6_enter:
+       movaps  16(%edx),%xmm1
+.byte  102,15,56,222,208
+.byte  102,15,56,222,216
+       leal    32(%edx),%edx
+.byte  102,15,56,222,224
+.byte  102,15,56,222,232
+.byte  102,15,56,222,240
+.byte  102,15,56,222,248
+       movaps  (%edx),%xmm0
+       jnz     .L007dec6_loop
+.byte  102,15,56,222,209
+.byte  102,15,56,222,217
+.byte  102,15,56,222,225
+.byte  102,15,56,222,233
+.byte  102,15,56,222,241
+.byte  102,15,56,222,249
+.byte  102,15,56,223,208
+.byte  102,15,56,223,216
+.byte  102,15,56,223,224
+.byte  102,15,56,223,232
+.byte  102,15,56,223,240
+.byte  102,15,56,223,248
+       ret
+.size  _aesni_decrypt6,.-_aesni_decrypt6
 .globl aesni_ecb_encrypt
 .type  aesni_ecb_encrypt,@function
 .align 16
@@ -240,153 +347,1364 @@ aesni_ecb_encrypt:
        movl    24(%esp),%edi
        movl    28(%esp),%eax
        movl    32(%esp),%edx
-       movl    36(%esp),%ecx
-       cmpl    $16,%eax
-       jb      .L006ecb_ret
+       movl    36(%esp),%ebx
        andl    $-16,%eax
-       testl   %ecx,%ecx
+       jz      .L008ecb_ret
        movl    240(%edx),%ecx
+       testl   %ebx,%ebx
+       jz      .L009ecb_decrypt
        movl    %edx,%ebp
        movl    %ecx,%ebx
-       jz      .L007ecb_decrypt
-       subl    $64,%eax
-       jbe     .L008ecb_enc_tail
-       jmp     .L009ecb_enc_loop3
+       cmpl    $96,%eax
+       jb      .L010ecb_enc_tail
+       movdqu  (%esi),%xmm2
+       movdqu  16(%esi),%xmm3
+       movdqu  32(%esi),%xmm4
+       movdqu  48(%esi),%xmm5
+       movdqu  64(%esi),%xmm6
+       movdqu  80(%esi),%xmm7
+       leal    96(%esi),%esi
+       subl    $96,%eax
+       jmp     .L011ecb_enc_loop6_enter
 .align 16
-.L009ecb_enc_loop3:
-       movups  (%esi),%xmm0
-       movups  16(%esi),%xmm1
-       movups  32(%esi),%xmm2
-       call    _aesni_encrypt3
-       subl    $48,%eax
-       leal    48(%esi),%esi
-       leal    48(%edi),%edi
-       movups  %xmm0,-48(%edi)
+.L012ecb_enc_loop6:
+       movups  %xmm2,(%edi)
+       movdqu  (%esi),%xmm2
+       movups  %xmm3,16(%edi)
+       movdqu  16(%esi),%xmm3
+       movups  %xmm4,32(%edi)
+       movdqu  32(%esi),%xmm4
+       movups  %xmm5,48(%edi)
+       movdqu  48(%esi),%xmm5
+       movups  %xmm6,64(%edi)
+       movdqu  64(%esi),%xmm6
+       movups  %xmm7,80(%edi)
+       leal    96(%edi),%edi
+       movdqu  80(%esi),%xmm7
+       leal    96(%esi),%esi
+.L011ecb_enc_loop6_enter:
+       call    _aesni_encrypt6
        movl    %ebp,%edx
-       movups  %xmm1,-32(%edi)
        movl    %ebx,%ecx
-       movups  %xmm2,-16(%edi)
-       ja      .L009ecb_enc_loop3
-.L008ecb_enc_tail:
-       addl    $64,%eax
-       jz      .L006ecb_ret
-       cmpl    $16,%eax
-       movups  (%esi),%xmm0
-       je      .L010ecb_enc_one
+       subl    $96,%eax
+       jnc     .L012ecb_enc_loop6
+       movups  %xmm2,(%edi)
+       movups  %xmm3,16(%edi)
+       movups  %xmm4,32(%edi)
+       movups  %xmm5,48(%edi)
+       movups  %xmm6,64(%edi)
+       movups  %xmm7,80(%edi)
+       leal    96(%edi),%edi
+       addl    $96,%eax
+       jz      .L008ecb_ret
+.L010ecb_enc_tail:
+       movups  (%esi),%xmm2
        cmpl    $32,%eax
-       movups  16(%esi),%xmm1
-       je      .L011ecb_enc_two
-       cmpl    $48,%eax
-       movups  32(%esi),%xmm2
-       je      .L012ecb_enc_three
-       movups  48(%esi),%xmm7
-       call    _aesni_encrypt4
-       movups  %xmm0,(%edi)
-       movups  %xmm1,16(%edi)
-       movups  %xmm2,32(%edi)
-       movups  %xmm7,48(%edi)
-       jmp     .L006ecb_ret
-.align 16
-.L010ecb_enc_one:
-       movups  (%edx),%xmm3
-       movups  16(%edx),%xmm4
-       leal    32(%edx),%edx
-       pxor    %xmm3,%xmm0
-.L013enc1_loop:
-       aesenc  %xmm4,%xmm0
+       jb      .L013ecb_enc_one
+       movups  16(%esi),%xmm3
+       je      .L014ecb_enc_two
+       movups  32(%esi),%xmm4
+       cmpl    $64,%eax
+       jb      .L015ecb_enc_three
+       movups  48(%esi),%xmm5
+       je      .L016ecb_enc_four
+       movups  64(%esi),%xmm6
+       xorps   %xmm7,%xmm7
+       call    _aesni_encrypt6
+       movups  %xmm2,(%edi)
+       movups  %xmm3,16(%edi)
+       movups  %xmm4,32(%edi)
+       movups  %xmm5,48(%edi)
+       movups  %xmm6,64(%edi)
+       jmp     .L008ecb_ret
+.align 16
+.L013ecb_enc_one:
+       movaps  (%edx),%xmm0
+       movaps  16(%edx),%xmm1
+       leal    32(%edx),%edx
+       xorps   %xmm0,%xmm2
+.L017enc1_loop_3:
+.byte  102,15,56,220,209
        decl    %ecx
-       movups  (%edx),%xmm4
+       movaps  (%edx),%xmm1
        leal    16(%edx),%edx
-       jnz     .L013enc1_loop
-       aesenclast      %xmm4,%xmm0
-       movups  %xmm0,(%edi)
-       jmp     .L006ecb_ret
+       jnz     .L017enc1_loop_3
+.byte  102,15,56,221,209
+       movups  %xmm2,(%edi)
+       jmp     .L008ecb_ret
 .align 16
-.L011ecb_enc_two:
+.L014ecb_enc_two:
+       xorps   %xmm4,%xmm4
        call    _aesni_encrypt3
-       movups  %xmm0,(%edi)
-       movups  %xmm1,16(%edi)
-       jmp     .L006ecb_ret
+       movups  %xmm2,(%edi)
+       movups  %xmm3,16(%edi)
+       jmp     .L008ecb_ret
 .align 16
-.L012ecb_enc_three:
+.L015ecb_enc_three:
        call    _aesni_encrypt3
-       movups  %xmm0,(%edi)
-       movups  %xmm1,16(%edi)
-       movups  %xmm2,32(%edi)
-       jmp     .L006ecb_ret
+       movups  %xmm2,(%edi)
+       movups  %xmm3,16(%edi)
+       movups  %xmm4,32(%edi)
+       jmp     .L008ecb_ret
 .align 16
-.L007ecb_decrypt:
-       subl    $64,%eax
-       jbe     .L014ecb_dec_tail
-       jmp     .L015ecb_dec_loop3
+.L016ecb_enc_four:
+       call    _aesni_encrypt4
+       movups  %xmm2,(%edi)
+       movups  %xmm3,16(%edi)
+       movups  %xmm4,32(%edi)
+       movups  %xmm5,48(%edi)
+       jmp     .L008ecb_ret
 .align 16
-.L015ecb_dec_loop3:
-       movups  (%esi),%xmm0
-       movups  16(%esi),%xmm1
-       movups  32(%esi),%xmm2
+.L009ecb_decrypt:
+       movl    %edx,%ebp
+       movl    %ecx,%ebx
+       cmpl    $96,%eax
+       jb      .L018ecb_dec_tail
+       movdqu  (%esi),%xmm2
+       movdqu  16(%esi),%xmm3
+       movdqu  32(%esi),%xmm4
+       movdqu  48(%esi),%xmm5
+       movdqu  64(%esi),%xmm6
+       movdqu  80(%esi),%xmm7
+       leal    96(%esi),%esi
+       subl    $96,%eax
+       jmp     .L019ecb_dec_loop6_enter
+.align 16
+.L020ecb_dec_loop6:
+       movups  %xmm2,(%edi)
+       movdqu  (%esi),%xmm2
+       movups  %xmm3,16(%edi)
+       movdqu  16(%esi),%xmm3
+       movups  %xmm4,32(%edi)
+       movdqu  32(%esi),%xmm4
+       movups  %xmm5,48(%edi)
+       movdqu  48(%esi),%xmm5
+       movups  %xmm6,64(%edi)
+       movdqu  64(%esi),%xmm6
+       movups  %xmm7,80(%edi)
+       leal    96(%edi),%edi
+       movdqu  80(%esi),%xmm7
+       leal    96(%esi),%esi
+.L019ecb_dec_loop6_enter:
+       call    _aesni_decrypt6
+       movl    %ebp,%edx
+       movl    %ebx,%ecx
+       subl    $96,%eax
+       jnc     .L020ecb_dec_loop6
+       movups  %xmm2,(%edi)
+       movups  %xmm3,16(%edi)
+       movups  %xmm4,32(%edi)
+       movups  %xmm5,48(%edi)
+       movups  %xmm6,64(%edi)
+       movups  %xmm7,80(%edi)
+       leal    96(%edi),%edi
+       addl    $96,%eax
+       jz      .L008ecb_ret
+.L018ecb_dec_tail:
+       movups  (%esi),%xmm2
+       cmpl    $32,%eax
+       jb      .L021ecb_dec_one
+       movups  16(%esi),%xmm3
+       je      .L022ecb_dec_two
+       movups  32(%esi),%xmm4
+       cmpl    $64,%eax
+       jb      .L023ecb_dec_three
+       movups  48(%esi),%xmm5
+       je      .L024ecb_dec_four
+       movups  64(%esi),%xmm6
+       xorps   %xmm7,%xmm7
+       call    _aesni_decrypt6
+       movups  %xmm2,(%edi)
+       movups  %xmm3,16(%edi)
+       movups  %xmm4,32(%edi)
+       movups  %xmm5,48(%edi)
+       movups  %xmm6,64(%edi)
+       jmp     .L008ecb_ret
+.align 16
+.L021ecb_dec_one:
+       movaps  (%edx),%xmm0
+       movaps  16(%edx),%xmm1
+       leal    32(%edx),%edx
+       xorps   %xmm0,%xmm2
+.L025dec1_loop_4:
+.byte  102,15,56,222,209
+       decl    %ecx
+       movaps  (%edx),%xmm1
+       leal    16(%edx),%edx
+       jnz     .L025dec1_loop_4
+.byte  102,15,56,223,209
+       movups  %xmm2,(%edi)
+       jmp     .L008ecb_ret
+.align 16
+.L022ecb_dec_two:
+       xorps   %xmm4,%xmm4
        call    _aesni_decrypt3
-       subl    $48,%eax
+       movups  %xmm2,(%edi)
+       movups  %xmm3,16(%edi)
+       jmp     .L008ecb_ret
+.align 16
+.L023ecb_dec_three:
+       call    _aesni_decrypt3
+       movups  %xmm2,(%edi)
+       movups  %xmm3,16(%edi)
+       movups  %xmm4,32(%edi)
+       jmp     .L008ecb_ret
+.align 16
+.L024ecb_dec_four:
+       call    _aesni_decrypt4
+       movups  %xmm2,(%edi)
+       movups  %xmm3,16(%edi)
+       movups  %xmm4,32(%edi)
+       movups  %xmm5,48(%edi)
+.L008ecb_ret:
+       popl    %edi
+       popl    %esi
+       popl    %ebx
+       popl    %ebp
+       ret
+.size  aesni_ecb_encrypt,.-.L_aesni_ecb_encrypt_begin
+.globl aesni_ccm64_encrypt_blocks
+.type  aesni_ccm64_encrypt_blocks,@function
+.align 16
+aesni_ccm64_encrypt_blocks:
+.L_aesni_ccm64_encrypt_blocks_begin:
+       pushl   %ebp
+       pushl   %ebx
+       pushl   %esi
+       pushl   %edi
+       movl    20(%esp),%esi
+       movl    24(%esp),%edi
+       movl    28(%esp),%eax
+       movl    32(%esp),%edx
+       movl    36(%esp),%ebx
+       movl    40(%esp),%ecx
+       movl    %esp,%ebp
+       subl    $60,%esp
+       andl    $-16,%esp
+       movl    %ebp,48(%esp)
+       movdqu  (%ebx),%xmm7
+       movdqu  (%ecx),%xmm3
+       movl    $202182159,(%esp)
+       movl    $134810123,4(%esp)
+       movl    $67438087,8(%esp)
+       movl    $66051,12(%esp)
+       movl    $1,%ecx
+       xorl    %ebp,%ebp
+       movl    %ecx,16(%esp)
+       movl    %ebp,20(%esp)
+       movl    %ebp,24(%esp)
+       movl    %ebp,28(%esp)
+       movdqa  (%esp),%xmm5
+.byte  102,15,56,0,253
+       movl    240(%edx),%ecx
+       movl    %edx,%ebp
+       movl    %ecx,%ebx
+       movdqa  %xmm7,%xmm2
+.L026ccm64_enc_outer:
+       movups  (%esi),%xmm6
+.byte  102,15,56,0,213
+       movl    %ebp,%edx
+       movl    %ebx,%ecx
+       movaps  (%edx),%xmm0
+       shrl    $1,%ecx
+       movaps  16(%edx),%xmm1
+       xorps   %xmm0,%xmm6
+       leal    32(%edx),%edx
+       xorps   %xmm0,%xmm2
+       xorps   %xmm6,%xmm3
+       movaps  (%edx),%xmm0
+.L027ccm64_enc2_loop:
+.byte  102,15,56,220,209
+       decl    %ecx
+.byte  102,15,56,220,217
+       movaps  16(%edx),%xmm1
+.byte  102,15,56,220,208
+       leal    32(%edx),%edx
+.byte  102,15,56,220,216
+       movaps  (%edx),%xmm0
+       jnz     .L027ccm64_enc2_loop
+.byte  102,15,56,220,209
+.byte  102,15,56,220,217
+.byte  102,15,56,221,208
+.byte  102,15,56,221,216
+       paddq   16(%esp),%xmm7
+       decl    %eax
+       leal    16(%esi),%esi
+       xorps   %xmm2,%xmm6
+       movdqa  %xmm7,%xmm2
+       movups  %xmm6,(%edi)
+       leal    16(%edi),%edi
+       jnz     .L026ccm64_enc_outer
+       movl    48(%esp),%esp
+       movl    40(%esp),%edi
+       movups  %xmm3,(%edi)
+       popl    %edi
+       popl    %esi
+       popl    %ebx
+       popl    %ebp
+       ret
+.size  aesni_ccm64_encrypt_blocks,.-.L_aesni_ccm64_encrypt_blocks_begin
+.globl aesni_ccm64_decrypt_blocks
+.type  aesni_ccm64_decrypt_blocks,@function
+.align 16
+aesni_ccm64_decrypt_blocks:
+.L_aesni_ccm64_decrypt_blocks_begin:
+       pushl   %ebp
+       pushl   %ebx
+       pushl   %esi
+       pushl   %edi
+       movl    20(%esp),%esi
+       movl    24(%esp),%edi
+       movl    28(%esp),%eax
+       movl    32(%esp),%edx
+       movl    36(%esp),%ebx
+       movl    40(%esp),%ecx
+       movl    %esp,%ebp
+       subl    $60,%esp
+       andl    $-16,%esp
+       movl    %ebp,48(%esp)
+       movdqu  (%ebx),%xmm7
+       movdqu  (%ecx),%xmm3
+       movl    $202182159,(%esp)
+       movl    $134810123,4(%esp)
+       movl    $67438087,8(%esp)
+       movl    $66051,12(%esp)
+       movl    $1,%ecx
+       xorl    %ebp,%ebp
+       movl    %ecx,16(%esp)
+       movl    %ebp,20(%esp)
+       movl    %ebp,24(%esp)
+       movl    %ebp,28(%esp)
+       movdqa  (%esp),%xmm5
+       movdqa  %xmm7,%xmm2
+.byte  102,15,56,0,253
+       movl    240(%edx),%ecx
+       movl    %edx,%ebp
+       movl    %ecx,%ebx
+       movaps  (%edx),%xmm0
+       movaps  16(%edx),%xmm1
+       leal    32(%edx),%edx
+       xorps   %xmm0,%xmm2
+.L028enc1_loop_5:
+.byte  102,15,56,220,209
+       decl    %ecx
+       movaps  (%edx),%xmm1
+       leal    16(%edx),%edx
+       jnz     .L028enc1_loop_5
+.byte  102,15,56,221,209
+.L029ccm64_dec_outer:
+       paddq   16(%esp),%xmm7
+       movups  (%esi),%xmm6
+       xorps   %xmm2,%xmm6
+       movdqa  %xmm7,%xmm2
+       leal    16(%esi),%esi
+.byte  102,15,56,0,213
+       movl    %ebp,%edx
+       movl    %ebx,%ecx
+       movups  %xmm6,(%edi)
+       leal    16(%edi),%edi
+       subl    $1,%eax
+       jz      .L030ccm64_dec_break
+       movaps  (%edx),%xmm0
+       shrl    $1,%ecx
+       movaps  16(%edx),%xmm1
+       xorps   %xmm0,%xmm6
+       leal    32(%edx),%edx
+       xorps   %xmm0,%xmm2
+       xorps   %xmm6,%xmm3
+       movaps  (%edx),%xmm0
+.L031ccm64_dec2_loop:
+.byte  102,15,56,220,209
+       decl    %ecx
+.byte  102,15,56,220,217
+       movaps  16(%edx),%xmm1
+.byte  102,15,56,220,208
+       leal    32(%edx),%edx
+.byte  102,15,56,220,216
+       movaps  (%edx),%xmm0
+       jnz     .L031ccm64_dec2_loop
+.byte  102,15,56,220,209
+.byte  102,15,56,220,217
+.byte  102,15,56,221,208
+.byte  102,15,56,221,216
+       jmp     .L029ccm64_dec_outer
+.align 16
+.L030ccm64_dec_break:
+       movaps  (%edx),%xmm0
+       movaps  16(%edx),%xmm1
+       xorps   %xmm0,%xmm6
+       leal    32(%edx),%edx
+       xorps   %xmm6,%xmm3
+.L032enc1_loop_6:
+.byte  102,15,56,220,217
+       decl    %ecx
+       movaps  (%edx),%xmm1
+       leal    16(%edx),%edx
+       jnz     .L032enc1_loop_6
+.byte  102,15,56,221,217
+       movl    48(%esp),%esp
+       movl    40(%esp),%edi
+       movups  %xmm3,(%edi)
+       popl    %edi
+       popl    %esi
+       popl    %ebx
+       popl    %ebp
+       ret
+.size  aesni_ccm64_decrypt_blocks,.-.L_aesni_ccm64_decrypt_blocks_begin
+.globl aesni_ctr32_encrypt_blocks
+.type  aesni_ctr32_encrypt_blocks,@function
+.align 16
+aesni_ctr32_encrypt_blocks:
+.L_aesni_ctr32_encrypt_blocks_begin:
+       pushl   %ebp
+       pushl   %ebx
+       pushl   %esi
+       pushl   %edi
+       movl    20(%esp),%esi
+       movl    24(%esp),%edi
+       movl    28(%esp),%eax
+       movl    32(%esp),%edx
+       movl    36(%esp),%ebx
+       movl    %esp,%ebp
+       subl    $88,%esp
+       andl    $-16,%esp
+       movl    %ebp,80(%esp)
+       cmpl    $1,%eax
+       je      .L033ctr32_one_shortcut
+       movdqu  (%ebx),%xmm7
+       movl    $202182159,(%esp)
+       movl    $134810123,4(%esp)
+       movl    $67438087,8(%esp)
+       movl    $66051,12(%esp)
+       movl    $6,%ecx
+       xorl    %ebp,%ebp
+       movl    %ecx,16(%esp)
+       movl    %ecx,20(%esp)
+       movl    %ecx,24(%esp)
+       movl    %ebp,28(%esp)
+.byte  102,15,58,22,251,3
+.byte  102,15,58,34,253,3
+       movl    240(%edx),%ecx
+       bswap   %ebx
+       pxor    %xmm1,%xmm1
+       pxor    %xmm0,%xmm0
+       movdqa  (%esp),%xmm2
+.byte  102,15,58,34,203,0
+       leal    3(%ebx),%ebp
+.byte  102,15,58,34,197,0
+       incl    %ebx
+.byte  102,15,58,34,203,1
+       incl    %ebp
+.byte  102,15,58,34,197,1
+       incl    %ebx
+.byte  102,15,58,34,203,2
+       incl    %ebp
+.byte  102,15,58,34,197,2
+       movdqa  %xmm1,48(%esp)
+.byte  102,15,56,0,202
+       movdqa  %xmm0,64(%esp)
+.byte  102,15,56,0,194
+       pshufd  $192,%xmm1,%xmm2
+       pshufd  $128,%xmm1,%xmm3
+       cmpl    $6,%eax
+       jb      .L034ctr32_tail
+       movdqa  %xmm7,32(%esp)
+       shrl    $1,%ecx
+       movl    %edx,%ebp
+       movl    %ecx,%ebx
+       subl    $6,%eax
+       jmp     .L035ctr32_loop6
+.align 16
+.L035ctr32_loop6:
+       pshufd  $64,%xmm1,%xmm4
+       movdqa  32(%esp),%xmm1
+       pshufd  $192,%xmm0,%xmm5
+       por     %xmm1,%xmm2
+       pshufd  $128,%xmm0,%xmm6
+       por     %xmm1,%xmm3
+       pshufd  $64,%xmm0,%xmm7
+       por     %xmm1,%xmm4
+       por     %xmm1,%xmm5
+       por     %xmm1,%xmm6
+       por     %xmm1,%xmm7
+       movaps  (%ebp),%xmm0
+       movaps  16(%ebp),%xmm1
+       leal    32(%ebp),%edx
+       decl    %ecx
+       pxor    %xmm0,%xmm2
+       pxor    %xmm0,%xmm3
+.byte  102,15,56,220,209
+       pxor    %xmm0,%xmm4
+.byte  102,15,56,220,217
+       pxor    %xmm0,%xmm5
+.byte  102,15,56,220,225
+       pxor    %xmm0,%xmm6
+.byte  102,15,56,220,233
+       pxor    %xmm0,%xmm7
+.byte  102,15,56,220,241
+       movaps  (%edx),%xmm0
+.byte  102,15,56,220,249
+       call    .L_aesni_encrypt6_enter
+       movups  (%esi),%xmm1
+       movups  16(%esi),%xmm0
+       xorps   %xmm1,%xmm2
+       movups  32(%esi),%xmm1
+       xorps   %xmm0,%xmm3
+       movups  %xmm2,(%edi)
+       movdqa  16(%esp),%xmm0
+       xorps   %xmm1,%xmm4
+       movdqa  48(%esp),%xmm1
+       movups  %xmm3,16(%edi)
+       movups  %xmm4,32(%edi)
+       paddd   %xmm0,%xmm1
+       paddd   64(%esp),%xmm0
+       movdqa  (%esp),%xmm2
+       movups  48(%esi),%xmm3
+       movups  64(%esi),%xmm4
+       xorps   %xmm3,%xmm5
+       movups  80(%esi),%xmm3
+       leal    96(%esi),%esi
+       movdqa  %xmm1,48(%esp)
+.byte  102,15,56,0,202
+       xorps   %xmm4,%xmm6
+       movups  %xmm5,48(%edi)
+       xorps   %xmm3,%xmm7
+       movdqa  %xmm0,64(%esp)
+.byte  102,15,56,0,194
+       movups  %xmm6,64(%edi)
+       pshufd  $192,%xmm1,%xmm2
+       movups  %xmm7,80(%edi)
+       leal    96(%edi),%edi
+       movl    %ebx,%ecx
+       pshufd  $128,%xmm1,%xmm3
+       subl    $6,%eax
+       jnc     .L035ctr32_loop6
+       addl    $6,%eax
+       jz      .L036ctr32_ret
+       movl    %ebp,%edx
+       leal    1(,%ecx,2),%ecx
+       movdqa  32(%esp),%xmm7
+.L034ctr32_tail:
+       por     %xmm7,%xmm2
+       cmpl    $2,%eax
+       jb      .L037ctr32_one
+       pshufd  $64,%xmm1,%xmm4
+       por     %xmm7,%xmm3
+       je      .L038ctr32_two
+       pshufd  $192,%xmm0,%xmm5
+       por     %xmm7,%xmm4
+       cmpl    $4,%eax
+       jb      .L039ctr32_three
+       pshufd  $128,%xmm0,%xmm6
+       por     %xmm7,%xmm5
+       je      .L040ctr32_four
+       por     %xmm7,%xmm6
+       call    _aesni_encrypt6
+       movups  (%esi),%xmm1
+       movups  16(%esi),%xmm0
+       xorps   %xmm1,%xmm2
+       movups  32(%esi),%xmm1
+       xorps   %xmm0,%xmm3
+       movups  48(%esi),%xmm0
+       xorps   %xmm1,%xmm4
+       movups  64(%esi),%xmm1
+       xorps   %xmm0,%xmm5
+       movups  %xmm2,(%edi)
+       xorps   %xmm1,%xmm6
+       movups  %xmm3,16(%edi)
+       movups  %xmm4,32(%edi)
+       movups  %xmm5,48(%edi)
+       movups  %xmm6,64(%edi)
+       jmp     .L036ctr32_ret
+.align 16
+.L033ctr32_one_shortcut:
+       movups  (%ebx),%xmm2
+       movl    240(%edx),%ecx
+.L037ctr32_one:
+       movaps  (%edx),%xmm0
+       movaps  16(%edx),%xmm1
+       leal    32(%edx),%edx
+       xorps   %xmm0,%xmm2
+.L041enc1_loop_7:
+.byte  102,15,56,220,209
+       decl    %ecx
+       movaps  (%edx),%xmm1
+       leal    16(%edx),%edx
+       jnz     .L041enc1_loop_7
+.byte  102,15,56,221,209
+       movups  (%esi),%xmm6
+       xorps   %xmm2,%xmm6
+       movups  %xmm6,(%edi)
+       jmp     .L036ctr32_ret
+.align 16
+.L038ctr32_two:
+       call    _aesni_encrypt3
+       movups  (%esi),%xmm5
+       movups  16(%esi),%xmm6
+       xorps   %xmm5,%xmm2
+       xorps   %xmm6,%xmm3
+       movups  %xmm2,(%edi)
+       movups  %xmm3,16(%edi)
+       jmp     .L036ctr32_ret
+.align 16
+.L039ctr32_three:
+       call    _aesni_encrypt3
+       movups  (%esi),%xmm5
+       movups  16(%esi),%xmm6
+       xorps   %xmm5,%xmm2
+       movups  32(%esi),%xmm7
+       xorps   %xmm6,%xmm3
+       movups  %xmm2,(%edi)
+       xorps   %xmm7,%xmm4
+       movups  %xmm3,16(%edi)
+       movups  %xmm4,32(%edi)
+       jmp     .L036ctr32_ret
+.align 16
+.L040ctr32_four:
+       call    _aesni_encrypt4
+       movups  (%esi),%xmm6
+       movups  16(%esi),%xmm7
+       movups  32(%esi),%xmm1
+       xorps   %xmm6,%xmm2
+       movups  48(%esi),%xmm0
+       xorps   %xmm7,%xmm3
+       movups  %xmm2,(%edi)
+       xorps   %xmm1,%xmm4
+       movups  %xmm3,16(%edi)
+       xorps   %xmm0,%xmm5
+       movups  %xmm4,32(%edi)
+       movups  %xmm5,48(%edi)
+.L036ctr32_ret:
+       movl    80(%esp),%esp
+       popl    %edi
+       popl    %esi
+       popl    %ebx
+       popl    %ebp
+       ret
+.size  aesni_ctr32_encrypt_blocks,.-.L_aesni_ctr32_encrypt_blocks_begin
+.globl aesni_xts_encrypt
+.type  aesni_xts_encrypt,@function
+.align 16
+aesni_xts_encrypt:
+.L_aesni_xts_encrypt_begin:
+       pushl   %ebp
+       pushl   %ebx
+       pushl   %esi
+       pushl   %edi
+       movl    36(%esp),%edx
+       movl    40(%esp),%esi
+       movl    240(%edx),%ecx
+       movups  (%esi),%xmm2
+       movaps  (%edx),%xmm0
+       movaps  16(%edx),%xmm1
+       leal    32(%edx),%edx
+       xorps   %xmm0,%xmm2
+.L042enc1_loop_8:
+.byte  102,15,56,220,209
+       decl    %ecx
+       movaps  (%edx),%xmm1
+       leal    16(%edx),%edx
+       jnz     .L042enc1_loop_8
+.byte  102,15,56,221,209
+       movl    20(%esp),%esi
+       movl    24(%esp),%edi
+       movl    28(%esp),%eax
+       movl    32(%esp),%edx
+       movl    %esp,%ebp
+       subl    $120,%esp
+       movl    240(%edx),%ecx
+       andl    $-16,%esp
+       movl    $135,96(%esp)
+       movl    $0,100(%esp)
+       movl    $1,104(%esp)
+       movl    $0,108(%esp)
+       movl    %eax,112(%esp)
+       movl    %ebp,116(%esp)
+       movdqa  %xmm2,%xmm1
+       pxor    %xmm0,%xmm0
+       movdqa  96(%esp),%xmm3
+       pcmpgtd %xmm1,%xmm0
+       andl    $-16,%eax
+       movl    %edx,%ebp
+       movl    %ecx,%ebx
+       subl    $96,%eax
+       jc      .L043xts_enc_short
+       shrl    $1,%ecx
+       movl    %ecx,%ebx
+       jmp     .L044xts_enc_loop6
+.align 16
+.L044xts_enc_loop6:
+       pshufd  $19,%xmm0,%xmm2
+       pxor    %xmm0,%xmm0
+       movdqa  %xmm1,(%esp)
+       paddq   %xmm1,%xmm1
+       pand    %xmm3,%xmm2
+       pcmpgtd %xmm1,%xmm0
+       pxor    %xmm2,%xmm1
+       pshufd  $19,%xmm0,%xmm2
+       pxor    %xmm0,%xmm0
+       movdqa  %xmm1,16(%esp)
+       paddq   %xmm1,%xmm1
+       pand    %xmm3,%xmm2
+       pcmpgtd %xmm1,%xmm0
+       pxor    %xmm2,%xmm1
+       pshufd  $19,%xmm0,%xmm2
+       pxor    %xmm0,%xmm0
+       movdqa  %xmm1,32(%esp)
+       paddq   %xmm1,%xmm1
+       pand    %xmm3,%xmm2
+       pcmpgtd %xmm1,%xmm0
+       pxor    %xmm2,%xmm1
+       pshufd  $19,%xmm0,%xmm2
+       pxor    %xmm0,%xmm0
+       movdqa  %xmm1,48(%esp)
+       paddq   %xmm1,%xmm1
+       pand    %xmm3,%xmm2
+       pcmpgtd %xmm1,%xmm0
+       pxor    %xmm2,%xmm1
+       pshufd  $19,%xmm0,%xmm7
+       movdqa  %xmm1,64(%esp)
+       paddq   %xmm1,%xmm1
+       movaps  (%ebp),%xmm0
+       pand    %xmm3,%xmm7
+       movups  (%esi),%xmm2
+       pxor    %xmm1,%xmm7
+       movdqu  16(%esi),%xmm3
+       xorps   %xmm0,%xmm2
+       movdqu  32(%esi),%xmm4
+       pxor    %xmm0,%xmm3
+       movdqu  48(%esi),%xmm5
+       pxor    %xmm0,%xmm4
+       movdqu  64(%esi),%xmm6
+       pxor    %xmm0,%xmm5
+       movdqu  80(%esi),%xmm1
+       pxor    %xmm0,%xmm6
+       leal    96(%esi),%esi
+       pxor    (%esp),%xmm2
+       movdqa  %xmm7,80(%esp)
+       pxor    %xmm1,%xmm7
+       movaps  16(%ebp),%xmm1
+       leal    32(%ebp),%edx
+       pxor    16(%esp),%xmm3
+.byte  102,15,56,220,209
+       pxor    32(%esp),%xmm4
+.byte  102,15,56,220,217
+       pxor    48(%esp),%xmm5
+       decl    %ecx
+.byte  102,15,56,220,225
+       pxor    64(%esp),%xmm6
+.byte  102,15,56,220,233
+       pxor    %xmm0,%xmm7
+.byte  102,15,56,220,241
+       movaps  (%edx),%xmm0
+.byte  102,15,56,220,249
+       call    .L_aesni_encrypt6_enter
+       movdqa  80(%esp),%xmm1
+       pxor    %xmm0,%xmm0
+       xorps   (%esp),%xmm2
+       pcmpgtd %xmm1,%xmm0
+       xorps   16(%esp),%xmm3
+       movups  %xmm2,(%edi)
+       xorps   32(%esp),%xmm4
+       movups  %xmm3,16(%edi)
+       xorps   48(%esp),%xmm5
+       movups  %xmm4,32(%edi)
+       xorps   64(%esp),%xmm6
+       movups  %xmm5,48(%edi)
+       xorps   %xmm1,%xmm7
+       movups  %xmm6,64(%edi)
+       pshufd  $19,%xmm0,%xmm2
+       movups  %xmm7,80(%edi)
+       leal    96(%edi),%edi
+       movdqa  96(%esp),%xmm3
+       pxor    %xmm0,%xmm0
+       paddq   %xmm1,%xmm1
+       pand    %xmm3,%xmm2
+       pcmpgtd %xmm1,%xmm0
+       movl    %ebx,%ecx
+       pxor    %xmm2,%xmm1
+       subl    $96,%eax
+       jnc     .L044xts_enc_loop6
+       leal    1(,%ecx,2),%ecx
+       movl    %ebp,%edx
+       movl    %ecx,%ebx
+.L043xts_enc_short:
+       addl    $96,%eax
+       jz      .L045xts_enc_done6x
+       movdqa  %xmm1,%xmm5
+       cmpl    $32,%eax
+       jb      .L046xts_enc_one
+       pshufd  $19,%xmm0,%xmm2
+       pxor    %xmm0,%xmm0
+       paddq   %xmm1,%xmm1
+       pand    %xmm3,%xmm2
+       pcmpgtd %xmm1,%xmm0
+       pxor    %xmm2,%xmm1
+       je      .L047xts_enc_two
+       pshufd  $19,%xmm0,%xmm2
+       pxor    %xmm0,%xmm0
+       movdqa  %xmm1,%xmm6
+       paddq   %xmm1,%xmm1
+       pand    %xmm3,%xmm2
+       pcmpgtd %xmm1,%xmm0
+       pxor    %xmm2,%xmm1
+       cmpl    $64,%eax
+       jb      .L048xts_enc_three
+       pshufd  $19,%xmm0,%xmm2
+       pxor    %xmm0,%xmm0
+       movdqa  %xmm1,%xmm7
+       paddq   %xmm1,%xmm1
+       pand    %xmm3,%xmm2
+       pcmpgtd %xmm1,%xmm0
+       pxor    %xmm2,%xmm1
+       movdqa  %xmm5,(%esp)
+       movdqa  %xmm6,16(%esp)
+       je      .L049xts_enc_four
+       movdqa  %xmm7,32(%esp)
+       pshufd  $19,%xmm0,%xmm7
+       movdqa  %xmm1,48(%esp)
+       paddq   %xmm1,%xmm1
+       pand    %xmm3,%xmm7
+       pxor    %xmm1,%xmm7
+       movdqu  (%esi),%xmm2
+       movdqu  16(%esi),%xmm3
+       movdqu  32(%esi),%xmm4
+       pxor    (%esp),%xmm2
+       movdqu  48(%esi),%xmm5
+       pxor    16(%esp),%xmm3
+       movdqu  64(%esi),%xmm6
+       pxor    32(%esp),%xmm4
+       leal    80(%esi),%esi
+       pxor    48(%esp),%xmm5
+       movdqa  %xmm7,64(%esp)
+       pxor    %xmm7,%xmm6
+       call    _aesni_encrypt6
+       movaps  64(%esp),%xmm1
+       xorps   (%esp),%xmm2
+       xorps   16(%esp),%xmm3
+       xorps   32(%esp),%xmm4
+       movups  %xmm2,(%edi)
+       xorps   48(%esp),%xmm5
+       movups  %xmm3,16(%edi)
+       xorps   %xmm1,%xmm6
+       movups  %xmm4,32(%edi)
+       movups  %xmm5,48(%edi)
+       movups  %xmm6,64(%edi)
+       leal    80(%edi),%edi
+       jmp     .L050xts_enc_done
+.align 16
+.L046xts_enc_one:
+       movups  (%esi),%xmm2
+       leal    16(%esi),%esi
+       xorps   %xmm5,%xmm2
+       movaps  (%edx),%xmm0
+       movaps  16(%edx),%xmm1
+       leal    32(%edx),%edx
+       xorps   %xmm0,%xmm2
+.L051enc1_loop_9:
+.byte  102,15,56,220,209
+       decl    %ecx
+       movaps  (%edx),%xmm1
+       leal    16(%edx),%edx
+       jnz     .L051enc1_loop_9
+.byte  102,15,56,221,209
+       xorps   %xmm5,%xmm2
+       movups  %xmm2,(%edi)
+       leal    16(%edi),%edi
+       movdqa  %xmm5,%xmm1
+       jmp     .L050xts_enc_done
+.align 16
+.L047xts_enc_two:
+       movaps  %xmm1,%xmm6
+       movups  (%esi),%xmm2
+       movups  16(%esi),%xmm3
+       leal    32(%esi),%esi
+       xorps   %xmm5,%xmm2
+       xorps   %xmm6,%xmm3
+       xorps   %xmm4,%xmm4
+       call    _aesni_encrypt3
+       xorps   %xmm5,%xmm2
+       xorps   %xmm6,%xmm3
+       movups  %xmm2,(%edi)
+       movups  %xmm3,16(%edi)
+       leal    32(%edi),%edi
+       movdqa  %xmm6,%xmm1
+       jmp     .L050xts_enc_done
+.align 16
+.L048xts_enc_three:
+       movaps  %xmm1,%xmm7
+       movups  (%esi),%xmm2
+       movups  16(%esi),%xmm3
+       movups  32(%esi),%xmm4
        leal    48(%esi),%esi
+       xorps   %xmm5,%xmm2
+       xorps   %xmm6,%xmm3
+       xorps   %xmm7,%xmm4
+       call    _aesni_encrypt3
+       xorps   %xmm5,%xmm2
+       xorps   %xmm6,%xmm3
+       xorps   %xmm7,%xmm4
+       movups  %xmm2,(%edi)
+       movups  %xmm3,16(%edi)
+       movups  %xmm4,32(%edi)
        leal    48(%edi),%edi
-       movups  %xmm0,-48(%edi)
+       movdqa  %xmm7,%xmm1
+       jmp     .L050xts_enc_done
+.align 16
+.L049xts_enc_four:
+       movaps  %xmm1,%xmm6
+       movups  (%esi),%xmm2
+       movups  16(%esi),%xmm3
+       movups  32(%esi),%xmm4
+       xorps   (%esp),%xmm2
+       movups  48(%esi),%xmm5
+       leal    64(%esi),%esi
+       xorps   16(%esp),%xmm3
+       xorps   %xmm7,%xmm4
+       xorps   %xmm6,%xmm5
+       call    _aesni_encrypt4
+       xorps   (%esp),%xmm2
+       xorps   16(%esp),%xmm3
+       xorps   %xmm7,%xmm4
+       movups  %xmm2,(%edi)
+       xorps   %xmm6,%xmm5
+       movups  %xmm3,16(%edi)
+       movups  %xmm4,32(%edi)
+       movups  %xmm5,48(%edi)
+       leal    64(%edi),%edi
+       movdqa  %xmm6,%xmm1
+       jmp     .L050xts_enc_done
+.align 16
+.L045xts_enc_done6x:
+       movl    112(%esp),%eax
+       andl    $15,%eax
+       jz      .L052xts_enc_ret
+       movdqa  %xmm1,%xmm5
+       movl    %eax,112(%esp)
+       jmp     .L053xts_enc_steal
+.align 16
+.L050xts_enc_done:
+       movl    112(%esp),%eax
+       pxor    %xmm0,%xmm0
+       andl    $15,%eax
+       jz      .L052xts_enc_ret
+       pcmpgtd %xmm1,%xmm0
+       movl    %eax,112(%esp)
+       pshufd  $19,%xmm0,%xmm5
+       paddq   %xmm1,%xmm1
+       pand    96(%esp),%xmm5
+       pxor    %xmm1,%xmm5
+.L053xts_enc_steal:
+       movzbl  (%esi),%ecx
+       movzbl  -16(%edi),%edx
+       leal    1(%esi),%esi
+       movb    %cl,-16(%edi)
+       movb    %dl,(%edi)
+       leal    1(%edi),%edi
+       subl    $1,%eax
+       jnz     .L053xts_enc_steal
+       subl    112(%esp),%edi
        movl    %ebp,%edx
-       movups  %xmm1,-32(%edi)
        movl    %ebx,%ecx
+       movups  -16(%edi),%xmm2
+       xorps   %xmm5,%xmm2
+       movaps  (%edx),%xmm0
+       movaps  16(%edx),%xmm1
+       leal    32(%edx),%edx
+       xorps   %xmm0,%xmm2
+.L054enc1_loop_10:
+.byte  102,15,56,220,209
+       decl    %ecx
+       movaps  (%edx),%xmm1
+       leal    16(%edx),%edx
+       jnz     .L054enc1_loop_10
+.byte  102,15,56,221,209
+       xorps   %xmm5,%xmm2
        movups  %xmm2,-16(%edi)
-       ja      .L015ecb_dec_loop3
-.L014ecb_dec_tail:
-       addl    $64,%eax
-       jz      .L006ecb_ret
-       cmpl    $16,%eax
-       movups  (%esi),%xmm0
-       je      .L016ecb_dec_one
+.L052xts_enc_ret:
+       movl    116(%esp),%esp
+       popl    %edi
+       popl    %esi
+       popl    %ebx
+       popl    %ebp
+       ret
+.size  aesni_xts_encrypt,.-.L_aesni_xts_encrypt_begin
+.globl aesni_xts_decrypt
+.type  aesni_xts_decrypt,@function
+.align 16
+aesni_xts_decrypt:
+.L_aesni_xts_decrypt_begin:
+       pushl   %ebp
+       pushl   %ebx
+       pushl   %esi
+       pushl   %edi
+       movl    36(%esp),%edx
+       movl    40(%esp),%esi
+       movl    240(%edx),%ecx
+       movups  (%esi),%xmm2
+       movaps  (%edx),%xmm0
+       movaps  16(%edx),%xmm1
+       leal    32(%edx),%edx
+       xorps   %xmm0,%xmm2
+.L055enc1_loop_11:
+.byte  102,15,56,220,209
+       decl    %ecx
+       movaps  (%edx),%xmm1
+       leal    16(%edx),%edx
+       jnz     .L055enc1_loop_11
+.byte  102,15,56,221,209
+       movl    20(%esp),%esi
+       movl    24(%esp),%edi
+       movl    28(%esp),%eax
+       movl    32(%esp),%edx
+       movl    %esp,%ebp
+       subl    $120,%esp
+       andl    $-16,%esp
+       xorl    %ebx,%ebx
+       testl   $15,%eax
+       setnz   %bl
+       shll    $4,%ebx
+       subl    %ebx,%eax
+       movl    $135,96(%esp)
+       movl    $0,100(%esp)
+       movl    $1,104(%esp)
+       movl    $0,108(%esp)
+       movl    %eax,112(%esp)
+       movl    %ebp,116(%esp)
+       movl    240(%edx),%ecx
+       movl    %edx,%ebp
+       movl    %ecx,%ebx
+       movdqa  %xmm2,%xmm1
+       pxor    %xmm0,%xmm0
+       movdqa  96(%esp),%xmm3
+       pcmpgtd %xmm1,%xmm0
+       andl    $-16,%eax
+       subl    $96,%eax
+       jc      .L056xts_dec_short
+       shrl    $1,%ecx
+       movl    %ecx,%ebx
+       jmp     .L057xts_dec_loop6
+.align 16
+.L057xts_dec_loop6:
+       pshufd  $19,%xmm0,%xmm2
+       pxor    %xmm0,%xmm0
+       movdqa  %xmm1,(%esp)
+       paddq   %xmm1,%xmm1
+       pand    %xmm3,%xmm2
+       pcmpgtd %xmm1,%xmm0
+       pxor    %xmm2,%xmm1
+       pshufd  $19,%xmm0,%xmm2
+       pxor    %xmm0,%xmm0
+       movdqa  %xmm1,16(%esp)
+       paddq   %xmm1,%xmm1
+       pand    %xmm3,%xmm2
+       pcmpgtd %xmm1,%xmm0
+       pxor    %xmm2,%xmm1
+       pshufd  $19,%xmm0,%xmm2
+       pxor    %xmm0,%xmm0
+       movdqa  %xmm1,32(%esp)
+       paddq   %xmm1,%xmm1
+       pand    %xmm3,%xmm2
+       pcmpgtd %xmm1,%xmm0
+       pxor    %xmm2,%xmm1
+       pshufd  $19,%xmm0,%xmm2
+       pxor    %xmm0,%xmm0
+       movdqa  %xmm1,48(%esp)
+       paddq   %xmm1,%xmm1
+       pand    %xmm3,%xmm2
+       pcmpgtd %xmm1,%xmm0
+       pxor    %xmm2,%xmm1
+       pshufd  $19,%xmm0,%xmm7
+       movdqa  %xmm1,64(%esp)
+       paddq   %xmm1,%xmm1
+       movaps  (%ebp),%xmm0
+       pand    %xmm3,%xmm7
+       movups  (%esi),%xmm2
+       pxor    %xmm1,%xmm7
+       movdqu  16(%esi),%xmm3
+       xorps   %xmm0,%xmm2
+       movdqu  32(%esi),%xmm4
+       pxor    %xmm0,%xmm3
+       movdqu  48(%esi),%xmm5
+       pxor    %xmm0,%xmm4
+       movdqu  64(%esi),%xmm6
+       pxor    %xmm0,%xmm5
+       movdqu  80(%esi),%xmm1
+       pxor    %xmm0,%xmm6
+       leal    96(%esi),%esi
+       pxor    (%esp),%xmm2
+       movdqa  %xmm7,80(%esp)
+       pxor    %xmm1,%xmm7
+       movaps  16(%ebp),%xmm1
+       leal    32(%ebp),%edx
+       pxor    16(%esp),%xmm3
+.byte  102,15,56,222,209
+       pxor    32(%esp),%xmm4
+.byte  102,15,56,222,217
+       pxor    48(%esp),%xmm5
+       decl    %ecx
+.byte  102,15,56,222,225
+       pxor    64(%esp),%xmm6
+.byte  102,15,56,222,233
+       pxor    %xmm0,%xmm7
+.byte  102,15,56,222,241
+       movaps  (%edx),%xmm0
+.byte  102,15,56,222,249
+       call    .L_aesni_decrypt6_enter
+       movdqa  80(%esp),%xmm1
+       pxor    %xmm0,%xmm0
+       xorps   (%esp),%xmm2
+       pcmpgtd %xmm1,%xmm0
+       xorps   16(%esp),%xmm3
+       movups  %xmm2,(%edi)
+       xorps   32(%esp),%xmm4
+       movups  %xmm3,16(%edi)
+       xorps   48(%esp),%xmm5
+       movups  %xmm4,32(%edi)
+       xorps   64(%esp),%xmm6
+       movups  %xmm5,48(%edi)
+       xorps   %xmm1,%xmm7
+       movups  %xmm6,64(%edi)
+       pshufd  $19,%xmm0,%xmm2
+       movups  %xmm7,80(%edi)
+       leal    96(%edi),%edi
+       movdqa  96(%esp),%xmm3
+       pxor    %xmm0,%xmm0
+       paddq   %xmm1,%xmm1
+       pand    %xmm3,%xmm2
+       pcmpgtd %xmm1,%xmm0
+       movl    %ebx,%ecx
+       pxor    %xmm2,%xmm1
+       subl    $96,%eax
+       jnc     .L057xts_dec_loop6
+       leal    1(,%ecx,2),%ecx
+       movl    %ebp,%edx
+       movl    %ecx,%ebx
+.L056xts_dec_short:
+       addl    $96,%eax
+       jz      .L058xts_dec_done6x
+       movdqa  %xmm1,%xmm5
        cmpl    $32,%eax
-       movups  16(%esi),%xmm1
-       je      .L017ecb_dec_two
-       cmpl    $48,%eax
-       movups  32(%esi),%xmm2
-       je      .L018ecb_dec_three
-       movups  48(%esi),%xmm7
-       call    _aesni_decrypt4
-       movups  %xmm0,(%edi)
-       movups  %xmm1,16(%edi)
-       movups  %xmm2,32(%edi)
-       movups  %xmm7,48(%edi)
-       jmp     .L006ecb_ret
-.align 16
-.L016ecb_dec_one:
-       movups  (%edx),%xmm3
-       movups  16(%edx),%xmm4
-       leal    32(%edx),%edx
-       pxor    %xmm3,%xmm0
-.L019dec1_loop:
-       aesdec  %xmm4,%xmm0
+       jb      .L059xts_dec_one
+       pshufd  $19,%xmm0,%xmm2
+       pxor    %xmm0,%xmm0
+       paddq   %xmm1,%xmm1
+       pand    %xmm3,%xmm2
+       pcmpgtd %xmm1,%xmm0
+       pxor    %xmm2,%xmm1
+       je      .L060xts_dec_two
+       pshufd  $19,%xmm0,%xmm2
+       pxor    %xmm0,%xmm0
+       movdqa  %xmm1,%xmm6
+       paddq   %xmm1,%xmm1
+       pand    %xmm3,%xmm2
+       pcmpgtd %xmm1,%xmm0
+       pxor    %xmm2,%xmm1
+       cmpl    $64,%eax
+       jb      .L061xts_dec_three
+       pshufd  $19,%xmm0,%xmm2
+       pxor    %xmm0,%xmm0
+       movdqa  %xmm1,%xmm7
+       paddq   %xmm1,%xmm1
+       pand    %xmm3,%xmm2
+       pcmpgtd %xmm1,%xmm0
+       pxor    %xmm2,%xmm1
+       movdqa  %xmm5,(%esp)
+       movdqa  %xmm6,16(%esp)
+       je      .L062xts_dec_four
+       movdqa  %xmm7,32(%esp)
+       pshufd  $19,%xmm0,%xmm7
+       movdqa  %xmm1,48(%esp)
+       paddq   %xmm1,%xmm1
+       pand    %xmm3,%xmm7
+       pxor    %xmm1,%xmm7
+       movdqu  (%esi),%xmm2
+       movdqu  16(%esi),%xmm3
+       movdqu  32(%esi),%xmm4
+       pxor    (%esp),%xmm2
+       movdqu  48(%esi),%xmm5
+       pxor    16(%esp),%xmm3
+       movdqu  64(%esi),%xmm6
+       pxor    32(%esp),%xmm4
+       leal    80(%esi),%esi
+       pxor    48(%esp),%xmm5
+       movdqa  %xmm7,64(%esp)
+       pxor    %xmm7,%xmm6
+       call    _aesni_decrypt6
+       movaps  64(%esp),%xmm1
+       xorps   (%esp),%xmm2
+       xorps   16(%esp),%xmm3
+       xorps   32(%esp),%xmm4
+       movups  %xmm2,(%edi)
+       xorps   48(%esp),%xmm5
+       movups  %xmm3,16(%edi)
+       xorps   %xmm1,%xmm6
+       movups  %xmm4,32(%edi)
+       movups  %xmm5,48(%edi)
+       movups  %xmm6,64(%edi)
+       leal    80(%edi),%edi
+       jmp     .L063xts_dec_done
+.align 16
+.L059xts_dec_one:
+       movups  (%esi),%xmm2
+       leal    16(%esi),%esi
+       xorps   %xmm5,%xmm2
+       movaps  (%edx),%xmm0
+       movaps  16(%edx),%xmm1
+       leal    32(%edx),%edx
+       xorps   %xmm0,%xmm2
+.L064dec1_loop_12:
+.byte  102,15,56,222,209
        decl    %ecx
-       movups  (%edx),%xmm4
+       movaps  (%edx),%xmm1
        leal    16(%edx),%edx
-       jnz     .L019dec1_loop
-       aesdeclast      %xmm4,%xmm0
-       movups  %xmm0,(%edi)
-       jmp     .L006ecb_ret
+       jnz     .L064dec1_loop_12
+.byte  102,15,56,223,209
+       xorps   %xmm5,%xmm2
+       movups  %xmm2,(%edi)
+       leal    16(%edi),%edi
+       movdqa  %xmm5,%xmm1
+       jmp     .L063xts_dec_done
 .align 16
-.L017ecb_dec_two:
+.L060xts_dec_two:
+       movaps  %xmm1,%xmm6
+       movups  (%esi),%xmm2
+       movups  16(%esi),%xmm3
+       leal    32(%esi),%esi
+       xorps   %xmm5,%xmm2
+       xorps   %xmm6,%xmm3
        call    _aesni_decrypt3
-       movups  %xmm0,(%edi)
-       movups  %xmm1,16(%edi)
-       jmp     .L006ecb_ret
+       xorps   %xmm5,%xmm2
+       xorps   %xmm6,%xmm3
+       movups  %xmm2,(%edi)
+       movups  %xmm3,16(%edi)
+       leal    32(%edi),%edi
+       movdqa  %xmm6,%xmm1
+       jmp     .L063xts_dec_done
 .align 16
-.L018ecb_dec_three:
+.L061xts_dec_three:
+       movaps  %xmm1,%xmm7
+       movups  (%esi),%xmm2
+       movups  16(%esi),%xmm3
+       movups  32(%esi),%xmm4
+       leal    48(%esi),%esi
+       xorps   %xmm5,%xmm2
+       xorps   %xmm6,%xmm3
+       xorps   %xmm7,%xmm4
        call    _aesni_decrypt3
-       movups  %xmm0,(%edi)
-       movups  %xmm1,16(%edi)
-       movups  %xmm2,32(%edi)
-.L006ecb_ret:
+       xorps   %xmm5,%xmm2
+       xorps   %xmm6,%xmm3
+       xorps   %xmm7,%xmm4
+       movups  %xmm2,(%edi)
+       movups  %xmm3,16(%edi)
+       movups  %xmm4,32(%edi)
+       leal    48(%edi),%edi
+       movdqa  %xmm7,%xmm1
+       jmp     .L063xts_dec_done
+.align 16
+.L062xts_dec_four:
+       movaps  %xmm1,%xmm6
+       movups  (%esi),%xmm2
+       movups  16(%esi),%xmm3
+       movups  32(%esi),%xmm4
+       xorps   (%esp),%xmm2
+       movups  48(%esi),%xmm5
+       leal    64(%esi),%esi
+       xorps   16(%esp),%xmm3
+       xorps   %xmm7,%xmm4
+       xorps   %xmm6,%xmm5
+       call    _aesni_decrypt4
+       xorps   (%esp),%xmm2
+       xorps   16(%esp),%xmm3
+       xorps   %xmm7,%xmm4
+       movups  %xmm2,(%edi)
+       xorps   %xmm6,%xmm5
+       movups  %xmm3,16(%edi)
+       movups  %xmm4,32(%edi)
+       movups  %xmm5,48(%edi)
+       leal    64(%edi),%edi
+       movdqa  %xmm6,%xmm1
+       jmp     .L063xts_dec_done
+.align 16
+.L058xts_dec_done6x:
+       movl    112(%esp),%eax
+       andl    $15,%eax
+       jz      .L065xts_dec_ret
+       movl    %eax,112(%esp)
+       jmp     .L066xts_dec_only_one_more
+.align 16
+.L063xts_dec_done:
+       movl    112(%esp),%eax
+       pxor    %xmm0,%xmm0
+       andl    $15,%eax
+       jz      .L065xts_dec_ret
+       pcmpgtd %xmm1,%xmm0
+       movl    %eax,112(%esp)
+       pshufd  $19,%xmm0,%xmm2
+       pxor    %xmm0,%xmm0
+       movdqa  96(%esp),%xmm3
+       paddq   %xmm1,%xmm1
+       pand    %xmm3,%xmm2
+       pcmpgtd %xmm1,%xmm0
+       pxor    %xmm2,%xmm1
+.L066xts_dec_only_one_more:
+       pshufd  $19,%xmm0,%xmm5
+       movdqa  %xmm1,%xmm6
+       paddq   %xmm1,%xmm1
+       pand    %xmm3,%xmm5
+       pxor    %xmm1,%xmm5
+       movl    %ebp,%edx
+       movl    %ebx,%ecx
+       movups  (%esi),%xmm2
+       xorps   %xmm5,%xmm2
+       movaps  (%edx),%xmm0
+       movaps  16(%edx),%xmm1
+       leal    32(%edx),%edx
+       xorps   %xmm0,%xmm2
+.L067dec1_loop_13:
+.byte  102,15,56,222,209
+       decl    %ecx
+       movaps  (%edx),%xmm1
+       leal    16(%edx),%edx
+       jnz     .L067dec1_loop_13
+.byte  102,15,56,223,209
+       xorps   %xmm5,%xmm2
+       movups  %xmm2,(%edi)
+.L068xts_dec_steal:
+       movzbl  16(%esi),%ecx
+       movzbl  (%edi),%edx
+       leal    1(%esi),%esi
+       movb    %cl,(%edi)
+       movb    %dl,16(%edi)
+       leal    1(%edi),%edi
+       subl    $1,%eax
+       jnz     .L068xts_dec_steal
+       subl    112(%esp),%edi
+       movl    %ebp,%edx
+       movl    %ebx,%ecx
+       movups  (%edi),%xmm2
+       xorps   %xmm6,%xmm2
+       movaps  (%edx),%xmm0
+       movaps  16(%edx),%xmm1
+       leal    32(%edx),%edx
+       xorps   %xmm0,%xmm2
+.L069dec1_loop_14:
+.byte  102,15,56,222,209
+       decl    %ecx
+       movaps  (%edx),%xmm1
+       leal    16(%edx),%edx
+       jnz     .L069dec1_loop_14
+.byte  102,15,56,223,209
+       xorps   %xmm6,%xmm2
+       movups  %xmm2,(%edi)
+.L065xts_dec_ret:
+       movl    116(%esp),%esp
        popl    %edi
        popl    %esi
        popl    %ebx
        popl    %ebp
        ret
-.size  aesni_ecb_encrypt,.-.L_aesni_ecb_encrypt_begin
+.size  aesni_xts_decrypt,.-.L_aesni_xts_decrypt_begin
 .globl aesni_cbc_encrypt
 .type  aesni_cbc_encrypt,@function
 .align 16
@@ -397,50 +1715,55 @@ aesni_cbc_encrypt:
        pushl   %esi
        pushl   %edi
        movl    20(%esp),%esi
+       movl    %esp,%ebx
        movl    24(%esp),%edi
+       subl    $24,%ebx
        movl    28(%esp),%eax
+       andl    $-16,%ebx
        movl    32(%esp),%edx
-       testl   %eax,%eax
        movl    36(%esp),%ebp
-       jz      .L020cbc_ret
+       testl   %eax,%eax
+       jz      .L070cbc_abort
        cmpl    $0,40(%esp)
-       movups  (%ebp),%xmm5
+       xchgl   %esp,%ebx
+       movups  (%ebp),%xmm7
        movl    240(%edx),%ecx
        movl    %edx,%ebp
+       movl    %ebx,16(%esp)
        movl    %ecx,%ebx
-       je      .L021cbc_decrypt
-       movaps  %xmm5,%xmm0
+       je      .L071cbc_decrypt
+       movaps  %xmm7,%xmm2
        cmpl    $16,%eax
-       jb      .L022cbc_enc_tail
+       jb      .L072cbc_enc_tail
        subl    $16,%eax
-       jmp     .L023cbc_enc_loop
+       jmp     .L073cbc_enc_loop
 .align 16
-.L023cbc_enc_loop:
-       movups  (%esi),%xmm5
+.L073cbc_enc_loop:
+       movups  (%esi),%xmm7
        leal    16(%esi),%esi
-       pxor    %xmm5,%xmm0
-       movups  (%edx),%xmm3
-       movups  16(%edx),%xmm4
+       movaps  (%edx),%xmm0
+       movaps  16(%edx),%xmm1
+       xorps   %xmm0,%xmm7
        leal    32(%edx),%edx
-       pxor    %xmm3,%xmm0
-.L024enc1_loop:
-       aesenc  %xmm4,%xmm0
+       xorps   %xmm7,%xmm2
+.L074enc1_loop_15:
+.byte  102,15,56,220,209
        decl    %ecx
-       movups  (%edx),%xmm4
+       movaps  (%edx),%xmm1
        leal    16(%edx),%edx
-       jnz     .L024enc1_loop
-       aesenclast      %xmm4,%xmm0
-       subl    $16,%eax
-       leal    16(%edi),%edi
+       jnz     .L074enc1_loop_15
+.byte  102,15,56,221,209
        movl    %ebx,%ecx
        movl    %ebp,%edx
-       movups  %xmm0,-16(%edi)
-       jnc     .L023cbc_enc_loop
+       movups  %xmm2,(%edi)
+       leal    16(%edi),%edi
+       subl    $16,%eax
+       jnc     .L073cbc_enc_loop
        addl    $16,%eax
-       jnz     .L022cbc_enc_tail
-       movaps  %xmm0,%xmm5
-       jmp     .L020cbc_ret
-.L022cbc_enc_tail:
+       jnz     .L072cbc_enc_tail
+       movaps  %xmm2,%xmm7
+       jmp     .L075cbc_ret
+.L072cbc_enc_tail:
        movl    %eax,%ecx
 .long  2767451785
        movl    $16,%ecx
@@ -451,113 +1774,169 @@ aesni_cbc_encrypt:
        movl    %ebx,%ecx
        movl    %edi,%esi
        movl    %ebp,%edx
-       jmp     .L023cbc_enc_loop
+       jmp     .L073cbc_enc_loop
 .align 16
-.L021cbc_decrypt:
-       subl    $64,%eax
-       jbe     .L025cbc_dec_tail
-       jmp     .L026cbc_dec_loop3
+.L071cbc_decrypt:
+       cmpl    $80,%eax
+       jbe     .L076cbc_dec_tail
+       movaps  %xmm7,(%esp)
+       subl    $80,%eax
+       jmp     .L077cbc_dec_loop6_enter
 .align 16
-.L026cbc_dec_loop3:
-       movups  (%esi),%xmm0
-       movups  16(%esi),%xmm1
-       movups  32(%esi),%xmm2
-       movaps  %xmm0,%xmm6
-       movaps  %xmm1,%xmm7
-       call    _aesni_decrypt3
-       subl    $48,%eax
-       leal    48(%esi),%esi
-       leal    48(%edi),%edi
-       pxor    %xmm5,%xmm0
-       pxor    %xmm6,%xmm1
-       movups  -16(%esi),%xmm5
-       pxor    %xmm7,%xmm2
-       movups  %xmm0,-48(%edi)
+.L078cbc_dec_loop6:
+       movaps  %xmm0,(%esp)
+       movups  %xmm7,(%edi)
+       leal    16(%edi),%edi
+.L077cbc_dec_loop6_enter:
+       movdqu  (%esi),%xmm2
+       movdqu  16(%esi),%xmm3
+       movdqu  32(%esi),%xmm4
+       movdqu  48(%esi),%xmm5
+       movdqu  64(%esi),%xmm6
+       movdqu  80(%esi),%xmm7
+       call    _aesni_decrypt6
+       movups  (%esi),%xmm1
+       movups  16(%esi),%xmm0
+       xorps   (%esp),%xmm2
+       xorps   %xmm1,%xmm3
+       movups  32(%esi),%xmm1
+       xorps   %xmm0,%xmm4
+       movups  48(%esi),%xmm0
+       xorps   %xmm1,%xmm5
+       movups  64(%esi),%xmm1
+       xorps   %xmm0,%xmm6
+       movups  80(%esi),%xmm0
+       xorps   %xmm1,%xmm7
+       movups  %xmm2,(%edi)
+       movups  %xmm3,16(%edi)
+       leal    96(%esi),%esi
+       movups  %xmm4,32(%edi)
        movl    %ebx,%ecx
-       movups  %xmm1,-32(%edi)
+       movups  %xmm5,48(%edi)
        movl    %ebp,%edx
-       movups  %xmm2,-16(%edi)
-       ja      .L026cbc_dec_loop3
-.L025cbc_dec_tail:
-       addl    $64,%eax
-       jz      .L020cbc_ret
-       movups  (%esi),%xmm0
+       movups  %xmm6,64(%edi)
+       leal    80(%edi),%edi
+       subl    $96,%eax
+       ja      .L078cbc_dec_loop6
+       movaps  %xmm7,%xmm2
+       movaps  %xmm0,%xmm7
+       addl    $80,%eax
+       jle     .L079cbc_dec_tail_collected
+       movups  %xmm2,(%edi)
+       leal    16(%edi),%edi
+.L076cbc_dec_tail:
+       movups  (%esi),%xmm2
+       movaps  %xmm2,%xmm6
        cmpl    $16,%eax
-       movaps  %xmm0,%xmm6
-       jbe     .L027cbc_dec_one
-       movups  16(%esi),%xmm1
-       cmpl    $32,%eax
-       movaps  %xmm1,%xmm7
-       jbe     .L028cbc_dec_two
-       movups  32(%esi),%xmm2
-       cmpl    $48,%eax
-       jbe     .L029cbc_dec_three
-       movups  48(%esi),%xmm7
-       call    _aesni_decrypt4
+       jbe     .L080cbc_dec_one
        movups  16(%esi),%xmm3
+       movaps  %xmm3,%xmm5
+       cmpl    $32,%eax
+       jbe     .L081cbc_dec_two
        movups  32(%esi),%xmm4
-       pxor    %xmm5,%xmm0
-       pxor    %xmm6,%xmm1
+       cmpl    $48,%eax
+       jbe     .L082cbc_dec_three
        movups  48(%esi),%xmm5
-       movups  %xmm0,(%edi)
-       pxor    %xmm3,%xmm2
-       pxor    %xmm4,%xmm7
-       movups  %xmm1,16(%edi)
-       movups  %xmm2,32(%edi)
-       movaps  %xmm7,%xmm0
-       leal    48(%edi),%edi
-       jmp     .L030cbc_dec_tail_collected
-.L027cbc_dec_one:
-       movups  (%edx),%xmm3
-       movups  16(%edx),%xmm4
-       leal    32(%edx),%edx
-       pxor    %xmm3,%xmm0
-.L031dec1_loop:
-       aesdec  %xmm4,%xmm0
+       cmpl    $64,%eax
+       jbe     .L083cbc_dec_four
+       movups  64(%esi),%xmm6
+       movaps  %xmm7,(%esp)
+       movups  (%esi),%xmm2
+       xorps   %xmm7,%xmm7
+       call    _aesni_decrypt6
+       movups  (%esi),%xmm1
+       movups  16(%esi),%xmm0
+       xorps   (%esp),%xmm2
+       xorps   %xmm1,%xmm3
+       movups  32(%esi),%xmm1
+       xorps   %xmm0,%xmm4
+       movups  48(%esi),%xmm0
+       xorps   %xmm1,%xmm5
+       movups  64(%esi),%xmm7
+       xorps   %xmm0,%xmm6
+       movups  %xmm2,(%edi)
+       movups  %xmm3,16(%edi)
+       movups  %xmm4,32(%edi)
+       movups  %xmm5,48(%edi)
+       leal    64(%edi),%edi
+       movaps  %xmm6,%xmm2
+       subl    $80,%eax
+       jmp     .L079cbc_dec_tail_collected
+.align 16
+.L080cbc_dec_one:
+       movaps  (%edx),%xmm0
+       movaps  16(%edx),%xmm1
+       leal    32(%edx),%edx
+       xorps   %xmm0,%xmm2
+.L084dec1_loop_16:
+.byte  102,15,56,222,209
        decl    %ecx
-       movups  (%edx),%xmm4
+       movaps  (%edx),%xmm1
        leal    16(%edx),%edx
-       jnz     .L031dec1_loop
-       aesdeclast      %xmm4,%xmm0
-       pxor    %xmm5,%xmm0
-       movaps  %xmm6,%xmm5
-       jmp     .L030cbc_dec_tail_collected
-.L028cbc_dec_two:
+       jnz     .L084dec1_loop_16
+.byte  102,15,56,223,209
+       xorps   %xmm7,%xmm2
+       movaps  %xmm6,%xmm7
+       subl    $16,%eax
+       jmp     .L079cbc_dec_tail_collected
+.align 16
+.L081cbc_dec_two:
+       xorps   %xmm4,%xmm4
        call    _aesni_decrypt3
-       pxor    %xmm5,%xmm0
-       pxor    %xmm6,%xmm1
-       movups  %xmm0,(%edi)
-       movaps  %xmm1,%xmm0
-       movaps  %xmm7,%xmm5
+       xorps   %xmm7,%xmm2
+       xorps   %xmm6,%xmm3
+       movups  %xmm2,(%edi)
+       movaps  %xmm3,%xmm2
        leal    16(%edi),%edi
-       jmp     .L030cbc_dec_tail_collected
-.L029cbc_dec_three:
+       movaps  %xmm5,%xmm7
+       subl    $32,%eax
+       jmp     .L079cbc_dec_tail_collected
+.align 16
+.L082cbc_dec_three:
        call    _aesni_decrypt3
-       pxor    %xmm5,%xmm0
-       pxor    %xmm6,%xmm1
-       pxor    %xmm7,%xmm2
-       movups  %xmm0,(%edi)
-       movups  %xmm1,16(%edi)
-       movaps  %xmm2,%xmm0
-       movups  32(%esi),%xmm5
+       xorps   %xmm7,%xmm2
+       xorps   %xmm6,%xmm3
+       xorps   %xmm5,%xmm4
+       movups  %xmm2,(%edi)
+       movaps  %xmm4,%xmm2
+       movups  %xmm3,16(%edi)
        leal    32(%edi),%edi
-.L030cbc_dec_tail_collected:
+       movups  32(%esi),%xmm7
+       subl    $48,%eax
+       jmp     .L079cbc_dec_tail_collected
+.align 16
+.L083cbc_dec_four:
+       call    _aesni_decrypt4
+       movups  16(%esi),%xmm1
+       movups  32(%esi),%xmm0
+       xorps   %xmm7,%xmm2
+       movups  48(%esi),%xmm7
+       xorps   %xmm6,%xmm3
+       movups  %xmm2,(%edi)
+       xorps   %xmm1,%xmm4
+       movups  %xmm3,16(%edi)
+       xorps   %xmm0,%xmm5
+       movups  %xmm4,32(%edi)
+       leal    48(%edi),%edi
+       movaps  %xmm5,%xmm2
+       subl    $64,%eax
+.L079cbc_dec_tail_collected:
        andl    $15,%eax
-       jnz     .L032cbc_dec_tail_partial
-       movups  %xmm0,(%edi)
-       jmp     .L020cbc_ret
-.L032cbc_dec_tail_partial:
-       movl    %esp,%ebp
-       subl    $16,%esp
-       andl    $-16,%esp
-       movaps  %xmm0,(%esp)
+       jnz     .L085cbc_dec_tail_partial
+       movups  %xmm2,(%edi)
+       jmp     .L075cbc_ret
+.align 16
+.L085cbc_dec_tail_partial:
+       movaps  %xmm2,(%esp)
+       movl    $16,%ecx
        movl    %esp,%esi
-       movl    %eax,%ecx
+       subl    %eax,%ecx
 .long  2767451785
-       movl    %ebp,%esp
-.L020cbc_ret:
+.L075cbc_ret:
+       movl    16(%esp),%esp
        movl    36(%esp),%ebp
-       movups  %xmm5,(%ebp)
+       movups  %xmm7,(%ebp)
+.L070cbc_abort:
        popl    %edi
        popl    %esi
        popl    %ebx
@@ -568,97 +1947,97 @@ aesni_cbc_encrypt:
 .align 16
 _aesni_set_encrypt_key:
        testl   %eax,%eax
-       jz      .L033bad_pointer
+       jz      .L086bad_pointer
        testl   %edx,%edx
-       jz      .L033bad_pointer
+       jz      .L086bad_pointer
        movups  (%eax),%xmm0
-       pxor    %xmm4,%xmm4
+       xorps   %xmm4,%xmm4
        leal    16(%edx),%edx
        cmpl    $256,%ecx
-       je      .L03414rounds
+       je      .L08714rounds
        cmpl    $192,%ecx
-       je      .L03512rounds
+       je      .L08812rounds
        cmpl    $128,%ecx
-       jne     .L036bad_keybits
+       jne     .L089bad_keybits
 .align 16
-.L03710rounds:
+.L09010rounds:
        movl    $9,%ecx
-       movups  %xmm0,-16(%edx)
-       aeskeygenassist $1,%xmm0,%xmm1
-       call    .L038key_128_cold
-       aeskeygenassist $2,%xmm0,%xmm1
-       call    .L039key_128
-       aeskeygenassist $4,%xmm0,%xmm1
-       call    .L039key_128
-       aeskeygenassist $8,%xmm0,%xmm1
-       call    .L039key_128
-       aeskeygenassist $16,%xmm0,%xmm1
-       call    .L039key_128
-       aeskeygenassist $32,%xmm0,%xmm1
-       call    .L039key_128
-       aeskeygenassist $64,%xmm0,%xmm1
-       call    .L039key_128
-       aeskeygenassist $128,%xmm0,%xmm1
-       call    .L039key_128
-       aeskeygenassist $27,%xmm0,%xmm1
-       call    .L039key_128
-       aeskeygenassist $54,%xmm0,%xmm1
-       call    .L039key_128
-       movups  %xmm0,(%edx)
+       movaps  %xmm0,-16(%edx)
+.byte  102,15,58,223,200,1
+       call    .L091key_128_cold
+.byte  102,15,58,223,200,2
+       call    .L092key_128
+.byte  102,15,58,223,200,4
+       call    .L092key_128
+.byte  102,15,58,223,200,8
+       call    .L092key_128
+.byte  102,15,58,223,200,16
+       call    .L092key_128
+.byte  102,15,58,223,200,32
+       call    .L092key_128
+.byte  102,15,58,223,200,64
+       call    .L092key_128
+.byte  102,15,58,223,200,128
+       call    .L092key_128
+.byte  102,15,58,223,200,27
+       call    .L092key_128
+.byte  102,15,58,223,200,54
+       call    .L092key_128
+       movaps  %xmm0,(%edx)
        movl    %ecx,80(%edx)
        xorl    %eax,%eax
        ret
 .align 16
-.L039key_128:
-       movups  %xmm0,(%edx)
+.L092key_128:
+       movaps  %xmm0,(%edx)
        leal    16(%edx),%edx
-.L038key_128_cold:
+.L091key_128_cold:
        shufps  $16,%xmm0,%xmm4
-       pxor    %xmm4,%xmm0
+       xorps   %xmm4,%xmm0
        shufps  $140,%xmm0,%xmm4
-       pxor    %xmm4,%xmm0
-       pshufd  $255,%xmm1,%xmm1
-       pxor    %xmm1,%xmm0
+       xorps   %xmm4,%xmm0
+       shufps  $255,%xmm1,%xmm1
+       xorps   %xmm1,%xmm0
        ret
 .align 16
-.L03512rounds:
+.L08812rounds:
        movq    16(%eax),%xmm2
        movl    $11,%ecx
-       movups  %xmm0,-16(%edx)
-       aeskeygenassist $1,%xmm2,%xmm1
-       call    .L040key_192a_cold
-       aeskeygenassist $2,%xmm2,%xmm1
-       call    .L041key_192b
-       aeskeygenassist $4,%xmm2,%xmm1
-       call    .L042key_192a
-       aeskeygenassist $8,%xmm2,%xmm1
-       call    .L041key_192b
-       aeskeygenassist $16,%xmm2,%xmm1
-       call    .L042key_192a
-       aeskeygenassist $32,%xmm2,%xmm1
-       call    .L041key_192b
-       aeskeygenassist $64,%xmm2,%xmm1
-       call    .L042key_192a
-       aeskeygenassist $128,%xmm2,%xmm1
-       call    .L041key_192b
-       movups  %xmm0,(%edx)
+       movaps  %xmm0,-16(%edx)
+.byte  102,15,58,223,202,1
+       call    .L093key_192a_cold
+.byte  102,15,58,223,202,2
+       call    .L094key_192b
+.byte  102,15,58,223,202,4
+       call    .L095key_192a
+.byte  102,15,58,223,202,8
+       call    .L094key_192b
+.byte  102,15,58,223,202,16
+       call    .L095key_192a
+.byte  102,15,58,223,202,32
+       call    .L094key_192b
+.byte  102,15,58,223,202,64
+       call    .L095key_192a
+.byte  102,15,58,223,202,128
+       call    .L094key_192b
+       movaps  %xmm0,(%edx)
        movl    %ecx,48(%edx)
        xorl    %eax,%eax
        ret
 .align 16
-.L042key_192a:
-       movups  %xmm0,(%edx)
+.L095key_192a:
+       movaps  %xmm0,(%edx)
        leal    16(%edx),%edx
 .align 16
-.L040key_192a_cold:
+.L093key_192a_cold:
        movaps  %xmm2,%xmm5
-.L043key_192b_warm:
+.L096key_192b_warm:
        shufps  $16,%xmm0,%xmm4
-       movaps  %xmm2,%xmm3
-       pxor    %xmm4,%xmm0
+       movdqa  %xmm2,%xmm3
+       xorps   %xmm4,%xmm0
        shufps  $140,%xmm0,%xmm4
        pslldq  $4,%xmm3
-       pxor    %xmm4,%xmm0
+       xorps   %xmm4,%xmm0
        pshufd  $85,%xmm1,%xmm1
        pxor    %xmm3,%xmm2
        pxor    %xmm1,%xmm0
@@ -666,80 +2045,80 @@ _aesni_set_encrypt_key:
        pxor    %xmm3,%xmm2
        ret
 .align 16
-.L041key_192b:
+.L094key_192b:
        movaps  %xmm0,%xmm3
        shufps  $68,%xmm0,%xmm5
-       movups  %xmm5,(%edx)
+       movaps  %xmm5,(%edx)
        shufps  $78,%xmm2,%xmm3
-       movups  %xmm3,16(%edx)
+       movaps  %xmm3,16(%edx)
        leal    32(%edx),%edx
-       jmp     .L043key_192b_warm
+       jmp     .L096key_192b_warm
 .align 16
-.L03414rounds:
+.L08714rounds:
        movups  16(%eax),%xmm2
        movl    $13,%ecx
        leal    16(%edx),%edx
-       movups  %xmm0,-32(%edx)
-       movups  %xmm2,-16(%edx)
-       aeskeygenassist $1,%xmm2,%xmm1
-       call    .L044key_256a_cold
-       aeskeygenassist $1,%xmm0,%xmm1
-       call    .L045key_256b
-       aeskeygenassist $2,%xmm2,%xmm1
-       call    .L046key_256a
-       aeskeygenassist $2,%xmm0,%xmm1
-       call    .L045key_256b
-       aeskeygenassist $4,%xmm2,%xmm1
-       call    .L046key_256a
-       aeskeygenassist $4,%xmm0,%xmm1
-       call    .L045key_256b
-       aeskeygenassist $8,%xmm2,%xmm1
-       call    .L046key_256a
-       aeskeygenassist $8,%xmm0,%xmm1
-       call    .L045key_256b
-       aeskeygenassist $16,%xmm2,%xmm1
-       call    .L046key_256a
-       aeskeygenassist $16,%xmm0,%xmm1
-       call    .L045key_256b
-       aeskeygenassist $32,%xmm2,%xmm1
-       call    .L046key_256a
-       aeskeygenassist $32,%xmm0,%xmm1
-       call    .L045key_256b
-       aeskeygenassist $64,%xmm2,%xmm1
-       call    .L046key_256a
-       movups  %xmm0,(%edx)
+       movaps  %xmm0,-32(%edx)
+       movaps  %xmm2,-16(%edx)
+.byte  102,15,58,223,202,1
+       call    .L097key_256a_cold
+.byte  102,15,58,223,200,1
+       call    .L098key_256b
+.byte  102,15,58,223,202,2
+       call    .L099key_256a
+.byte  102,15,58,223,200,2
+       call    .L098key_256b
+.byte  102,15,58,223,202,4
+       call    .L099key_256a
+.byte  102,15,58,223,200,4
+       call    .L098key_256b
+.byte  102,15,58,223,202,8
+       call    .L099key_256a
+.byte  102,15,58,223,200,8
+       call    .L098key_256b
+.byte  102,15,58,223,202,16
+       call    .L099key_256a
+.byte  102,15,58,223,200,16
+       call    .L098key_256b
+.byte  102,15,58,223,202,32
+       call    .L099key_256a
+.byte  102,15,58,223,200,32
+       call    .L098key_256b
+.byte  102,15,58,223,202,64
+       call    .L099key_256a
+       movaps  %xmm0,(%edx)
        movl    %ecx,16(%edx)
        xorl    %eax,%eax
        ret
 .align 16
-.L046key_256a:
-       movups  %xmm2,(%edx)
+.L099key_256a:
+       movaps  %xmm2,(%edx)
        leal    16(%edx),%edx
-.L044key_256a_cold:
+.L097key_256a_cold:
        shufps  $16,%xmm0,%xmm4
-       pxor    %xmm4,%xmm0
+       xorps   %xmm4,%xmm0
        shufps  $140,%xmm0,%xmm4
-       pxor    %xmm4,%xmm0
-       pshufd  $255,%xmm1,%xmm1
-       pxor    %xmm1,%xmm0
+       xorps   %xmm4,%xmm0
+       shufps  $255,%xmm1,%xmm1
+       xorps   %xmm1,%xmm0
        ret
 .align 16
-.L045key_256b:
-       movups  %xmm0,(%edx)
+.L098key_256b:
+       movaps  %xmm0,(%edx)
        leal    16(%edx),%edx
        shufps  $16,%xmm2,%xmm4
-       pxor    %xmm4,%xmm2
+       xorps   %xmm4,%xmm2
        shufps  $140,%xmm2,%xmm4
-       pxor    %xmm4,%xmm2
-       pshufd  $170,%xmm1,%xmm1
-       pxor    %xmm1,%xmm2
+       xorps   %xmm4,%xmm2
+       shufps  $170,%xmm1,%xmm1
+       xorps   %xmm1,%xmm2
        ret
 .align 4
-.L033bad_pointer:
+.L086bad_pointer:
        movl    $-1,%eax
        ret
 .align 4
-.L036bad_keybits:
+.L089bad_keybits:
        movl    $-2,%eax
        ret
 .size  _aesni_set_encrypt_key,.-_aesni_set_encrypt_key
@@ -766,30 +2145,30 @@ aesni_set_decrypt_key:
        movl    12(%esp),%edx
        shll    $4,%ecx
        testl   %eax,%eax
-       jnz     .L047dec_key_ret
+       jnz     .L100dec_key_ret
        leal    16(%edx,%ecx,1),%eax
-       movups  (%edx),%xmm0
-       movups  (%eax),%xmm1
-       movups  %xmm0,(%eax)
-       movups  %xmm1,(%edx)
+       movaps  (%edx),%xmm0
+       movaps  (%eax),%xmm1
+       movaps  %xmm0,(%eax)
+       movaps  %xmm1,(%edx)
        leal    16(%edx),%edx
        leal    -16(%eax),%eax
-.L048dec_key_inverse:
-       movups  (%edx),%xmm0
-       movups  (%eax),%xmm1
-       aesimc  %xmm0,%xmm0
-       aesimc  %xmm1,%xmm1
+.L101dec_key_inverse:
+       movaps  (%edx),%xmm0
+       movaps  (%eax),%xmm1
+.byte  102,15,56,219,192
+.byte  102,15,56,219,201
        leal    16(%edx),%edx
        leal    -16(%eax),%eax
+       movaps  %xmm0,16(%eax)
+       movaps  %xmm1,-16(%edx)
        cmpl    %edx,%eax
-       movups  %xmm0,16(%eax)
-       movups  %xmm1,-16(%edx)
-       ja      .L048dec_key_inverse
-       movups  (%edx),%xmm0
-       aesimc  %xmm0,%xmm0
-       movups  %xmm0,(%edx)
+       ja      .L101dec_key_inverse
+       movaps  (%edx),%xmm0
+.byte  102,15,56,219,192
+       movaps  %xmm0,(%edx)
        xorl    %eax,%eax
-.L047dec_key_ret:
+.L100dec_key_ret:
        ret
 .size  aesni_set_decrypt_key,.-.L_aesni_set_decrypt_key_begin
 .byte  65,69,83,32,102,111,114,32,73,110,116,101,108,32,65,69
diff --git a/lib/accelerated/x86.h b/lib/accelerated/x86.h
index c344283..8886516 100644
--- a/lib/accelerated/x86.h
+++ b/lib/accelerated/x86.h
@@ -1,3 +1,12 @@
+#include <config.h>
+
+#ifdef HAVE_CPUID_H
+# include <cpuid.h>
+# define cpuid __cpuid
+
+#else
 #define cpuid(func,ax,bx,cx,dx)\
   __asm__ __volatile__ ("cpuid":\
   "=a" (ax), "=b" (bx), "=c" (cx), "=d" (dx) : "a" (func));
+
+#endif
diff --git a/lib/nettle/Makefile.am b/lib/nettle/Makefile.am
index 500117b..89622c4 100644
--- a/lib/nettle/Makefile.am
+++ b/lib/nettle/Makefile.am
@@ -36,7 +36,6 @@ noinst_LTLIBRARIES = libcrypto.la
 
 libcrypto_la_SOURCES = pk.c mpi.c mac.c cipher.c rnd.c init.c egd.c egd.h \
        multi.c ecc_free.c ecc.h ecc_make_key.c ecc_shared_secret.c \
-       ecc_test.c ecc_map.c \
-       ecc_mulmod.c ecc_points.c ecc_projective_dbl_point_3.c \
+       ecc_map.c ecc_mulmod.c ecc_points.c ecc_projective_dbl_point_3.c \
        ecc_projective_add_point.c ecc_projective_dbl_point.c \
        ecc_sign_hash.c ecc_verify_hash.c gnettle.h
diff --git a/lib/nettle/ecc_free.c b/lib/nettle/ecc_free.c
index bbf087d..b5e23f9 100644
--- a/lib/nettle/ecc_free.c
+++ b/lib/nettle/ecc_free.c
@@ -1,19 +1,29 @@
-/* LibTomCrypt, modular cryptographic library -- Tom St Denis
+/*
+ * Copyright (C) 2011 Free Software Foundation, Inc.
  *
- * LibTomCrypt is a library that provides various cryptographic
- * algorithms in a highly modular and flexible manner.
+ * This file is part of GNUTLS.
  *
- * The library is free for all purposes without any express
- * guarantee it works.
+ * The GNUTLS library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public License
+ * as published by the Free Software Foundation; either version 2.1 of
+ * the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301,
+ * USA
  *
- * Tom St Denis, address@hidden, http://libtom.org
  */
 
-/* Implements ECC over Z/pZ for curve y^2 = x^3 + ax + b
- *
- * All curves taken from NIST recommendation paper of July 1999
- * Available at http://csrc.nist.gov/cryptval/dss.htm
+/* Based on public domain code of LibTomCrypt by Tom St Denis.
+ * Adapted to gmp and nettle by Nikos Mavrogiannopoulos.
  */
+
 #include "ecc.h"
 
 /**
diff --git a/lib/nettle/ecc_make_key.c b/lib/nettle/ecc_make_key.c
index 3667a5b..ade9e5f 100644
--- a/lib/nettle/ecc_make_key.c
+++ b/lib/nettle/ecc_make_key.c
@@ -1,19 +1,29 @@
-/* LibTomCrypt, modular cryptographic library -- Tom St Denis
+/*
+ * Copyright (C) 2011 Free Software Foundation, Inc.
  *
- * LibTomCrypt is a library that provides various cryptographic
- * algorithms in a highly modular and flexible manner.
+ * This file is part of GNUTLS.
  *
- * The library is free for all purposes without any express
- * guarantee it works.
+ * The GNUTLS library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public License
+ * as published by the Free Software Foundation; either version 2.1 of
+ * the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301,
+ * USA
  *
- * Tom St Denis, address@hidden, http://libtom.org
  */
 
-/* Implements ECC over Z/pZ for curve y^2 = x^3 + ax + b
- *
- * All curves taken from NIST recommendation paper of July 1999
- * Available at http://csrc.nist.gov/cryptval/dss.htm
+/* Based on public domain code of LibTomCrypt by Tom St Denis.
+ * Adapted to gmp and nettle by Nikos Mavrogiannopoulos.
  */
+
 #include "ecc.h"
 
 /**
diff --git a/lib/nettle/ecc_map.c b/lib/nettle/ecc_map.c
index 2ad60bb..a68feb0 100644
--- a/lib/nettle/ecc_map.c
+++ b/lib/nettle/ecc_map.c
@@ -1,19 +1,29 @@
-/* LibTomCrypt, modular cryptographic library -- Tom St Denis
+/*
+ * Copyright (C) 2011 Free Software Foundation, Inc.
  *
- * LibTomCrypt is a library that provides various cryptographic
- * algorithms in a highly modular and flexible manner.
+ * This file is part of GNUTLS.
  *
- * The library is free for all purposes without any express
- * guarantee it works.
+ * The GNUTLS library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public License
+ * as published by the Free Software Foundation; either version 2.1 of
+ * the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301,
+ * USA
  *
- * Tom St Denis, address@hidden, http://libtom.org
  */
 
-/* Implements ECC over Z/pZ for curve y^2 = x^3 + ax + b
- *
- * All curves taken from NIST recommendation paper of July 1999
- * Available at http://csrc.nist.gov/cryptval/dss.htm
+/* Based on public domain code of LibTomCrypt by Tom St Denis.
+ * Adapted to gmp and nettle by Nikos Mavrogiannopoulos.
  */
+
 #include "ecc.h"
 
 /**
diff --git a/lib/nettle/ecc_mulmod.c b/lib/nettle/ecc_mulmod.c
index c8e91a4..6781b03 100644
--- a/lib/nettle/ecc_mulmod.c
+++ b/lib/nettle/ecc_mulmod.c
@@ -1,19 +1,29 @@
-/* LibTomCrypt, modular cryptographic library -- Tom St Denis
+/*
+ * Copyright (C) 2011 Free Software Foundation, Inc.
  *
- * LibTomCrypt is a library that provides various cryptographic
- * algorithms in a highly modular and flexible manner.
+ * This file is part of GNUTLS.
  *
- * The library is free for all purposes without any express
- * guarantee it works.
+ * The GNUTLS library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public License
+ * as published by the Free Software Foundation; either version 2.1 of
+ * the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301,
+ * USA
  *
- * Tom St Denis, address@hidden, http://libtom.org
  */
 
-/* Implements ECC over Z/pZ for curve y^2 = x^3 + ax + b
- *
- * All curves taken from NIST recommendation paper of July 1999
- * Available at http://csrc.nist.gov/cryptval/dss.htm
+/* Based on public domain code of LibTomCrypt by Tom St Denis.
+ * Adapted to gmp and nettle by Nikos Mavrogiannopoulos.
  */
+
 #include "ecc.h"
 
 /**
diff --git a/lib/nettle/ecc_points.c b/lib/nettle/ecc_points.c
index 7a29cb1..ff13755 100644
--- a/lib/nettle/ecc_points.c
+++ b/lib/nettle/ecc_points.c
@@ -1,19 +1,29 @@
-/* LibTomCrypt, modular cryptographic library -- Tom St Denis
+/*
+ * Copyright (C) 2011 Free Software Foundation, Inc.
  *
- * LibTomCrypt is a library that provides various cryptographic
- * algorithms in a highly modular and flexible manner.
+ * This file is part of GNUTLS.
  *
- * The library is free for all purposes without any express
- * guarantee it works.
+ * The GNUTLS library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public License
+ * as published by the Free Software Foundation; either version 2.1 of
+ * the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301,
+ * USA
  *
- * Tom St Denis, address@hidden, http://libtom.org
  */
 
-/* Implements ECC over Z/pZ for curve y^2 = x^3 + ax + b
- *
- * All curves taken from NIST recommendation paper of July 1999
- * Available at http://csrc.nist.gov/cryptval/dss.htm
+/* Based on public domain code of LibTomCrypt by Tom St Denis.
+ * Adapted to gmp and nettle by Nikos Mavrogiannopoulos.
  */
+
 #include "ecc.h"
 
 /**
diff --git a/lib/nettle/ecc_projective_add_point.c 
b/lib/nettle/ecc_projective_add_point.c
index 35d12bc..b692289 100644
--- a/lib/nettle/ecc_projective_add_point.c
+++ b/lib/nettle/ecc_projective_add_point.c
@@ -1,19 +1,29 @@
-/* LibTomCrypt, modular cryptographic library -- Tom St Denis
+/*
+ * Copyright (C) 2011 Free Software Foundation, Inc.
  *
- * LibTomCrypt is a library that provides various cryptographic
- * algorithms in a highly modular and flexible manner.
+ * This file is part of GNUTLS.
  *
- * The library is free for all purposes without any express
- * guarantee it works.
+ * The GNUTLS library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public License
+ * as published by the Free Software Foundation; either version 2.1 of
+ * the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301,
+ * USA
  *
- * Tom St Denis, address@hidden, http://libtom.org
  */
 
-/* Implements ECC over Z/pZ for curve y^2 = x^3 + ax + b
- *
- * All curves taken from NIST recommendation paper of July 1999
- * Available at http://csrc.nist.gov/cryptval/dss.htm
+/* Based on public domain code of LibTomCrypt by Tom St Denis.
+ * Adapted to gmp and nettle by Nikos Mavrogiannopoulos.
  */
+
 #include "ecc.h"
 
 /**
diff --git a/lib/nettle/ecc_projective_dbl_point_3.c 
b/lib/nettle/ecc_projective_dbl_point_3.c
index 1b85f68..28f08b3 100644
--- a/lib/nettle/ecc_projective_dbl_point_3.c
+++ b/lib/nettle/ecc_projective_dbl_point_3.c
@@ -1,19 +1,29 @@
-/* LibTomCrypt, modular cryptographic library -- Tom St Denis
+/*
+ * Copyright (C) 2011 Free Software Foundation, Inc.
  *
- * LibTomCrypt is a library that provides various cryptographic
- * algorithms in a highly modular and flexible manner.
+ * This file is part of GNUTLS.
  *
- * The library is free for all purposes without any express
- * guarantee it works.
+ * The GNUTLS library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public License
+ * as published by the Free Software Foundation; either version 2.1 of
+ * the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301,
+ * USA
  *
- * Tom St Denis, address@hidden, http://libtom.org
  */
 
-/* Implements ECC over Z/pZ for curve y^2 = x^3 - 3x + b
- *
- * All curves taken from NIST recommendation paper of July 1999
- * Available at http://csrc.nist.gov/cryptval/dss.htm
+/* Based on public domain code of LibTomCrypt by Tom St Denis.
+ * Adapted to gmp and nettle by Nikos Mavrogiannopoulos.
  */
+
 #include "ecc.h"
 
 /**
diff --git a/lib/nettle/ecc_shared_secret.c b/lib/nettle/ecc_shared_secret.c
index c229870..8e41a60 100644
--- a/lib/nettle/ecc_shared_secret.c
+++ b/lib/nettle/ecc_shared_secret.c
@@ -1,19 +1,29 @@
-/* LibTomCrypt, modular cryptographic library -- Tom St Denis
+/*
+ * Copyright (C) 2011 Free Software Foundation, Inc.
  *
- * LibTomCrypt is a library that provides various cryptographic
- * algorithms in a highly modular and flexible manner.
+ * This file is part of GNUTLS.
  *
- * The library is free for all purposes without any express
- * guarantee it works.
+ * The GNUTLS library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public License
+ * as published by the Free Software Foundation; either version 2.1 of
+ * the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301,
+ * USA
  *
- * Tom St Denis, address@hidden, http://libtom.org
  */
 
-/* Implements ECC over Z/pZ for curve y^2 = x^3 + ax + b
- *
- * All curves taken from NIST recommendation paper of July 1999
- * Available at http://csrc.nist.gov/cryptval/dss.htm
+/* Based on public domain code of LibTomCrypt by Tom St Denis.
+ * Adapted to gmp and nettle by Nikos Mavrogiannopoulos.
  */
+
 #include "ecc.h"
 #include <string.h>
 
diff --git a/lib/nettle/ecc_sign_hash.c b/lib/nettle/ecc_sign_hash.c
index 12be36d..be0d8d7 100644
--- a/lib/nettle/ecc_sign_hash.c
+++ b/lib/nettle/ecc_sign_hash.c
@@ -1,19 +1,29 @@
-/* LibTomCrypt, modular cryptographic library -- Tom St Denis
+/*
+ * Copyright (C) 2011 Free Software Foundation, Inc.
  *
- * LibTomCrypt is a library that provides various cryptographic
- * algorithms in a highly modular and flexible manner.
+ * This file is part of GNUTLS.
  *
- * The library is free for all purposes without any express
- * guarantee it works.
+ * The GNUTLS library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public License
+ * as published by the Free Software Foundation; either version 2.1 of
+ * the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301,
+ * USA
  *
- * Tom St Denis, address@hidden, http://libtom.org
  */
 
-/* Implements ECC over Z/pZ for curve y^2 = x^3 + ax + b
- *
- * All curves taken from NIST recommendation paper of July 1999
- * Available at http://csrc.nist.gov/cryptval/dss.htm
+/* Based on public domain code of LibTomCrypt by Tom St Denis.
+ * Adapted to gmp and nettle by Nikos Mavrogiannopoulos.
  */
+
 #include "ecc.h"
 #include <nettle/dsa.h>
 
diff --git a/lib/nettle/ecc_test.c b/lib/nettle/ecc_test.c
deleted file mode 100644
index 30250fa..0000000
--- a/lib/nettle/ecc_test.c
+++ /dev/null
@@ -1,142 +0,0 @@
-/* LibTomCrypt, modular cryptographic library -- Tom St Denis
- *
- * LibTomCrypt is a library that provides various cryptographic
- * algorithms in a highly modular and flexible manner.
- *
- * The library is free for all purposes without any express
- * guarantee it works.
- *
- * Tom St Denis, address@hidden, http://libtom.org
- */
-
-/* Implements ECC over Z/pZ for curve y^2 = x^3 + ax + b
- *
- * All curves taken from NIST recommendation paper of July 1999
- * Available at http://csrc.nist.gov/cryptval/dss.htm
- */
-#include "ecc.h"
-#include "gnettle.h"
-#include <gnutls_int.h>
-#include <algorithms.h>
-
-/**
-  @file ecc_test.c
-  ECC Crypto, Tom St Denis
-*/
-
-/**
-  Perform on the ECC system
-  @return 0 if successful
-*/
-int
-ecc_test (void)
-{
-  mpz_t modulus, order, A;
-  ecc_point *G, *GG;
-  int i, err;
-
-  if ((err = mp_init_multi (&modulus, &A, &order, NULL)) != 0)
-    {
-      return err;
-    }
-
-  G = ecc_new_point ();
-  GG = ecc_new_point ();
-  if (G == NULL || GG == NULL)
-    {
-      mp_clear_multi (&modulus, &order, NULL);
-      ecc_del_point (G);
-      ecc_del_point (GG);
-      return -1;
-    }
-
-  for (i = 1; i <= 3; i++)
-    {
-      const gnutls_ecc_curve_entry_st *st = _gnutls_ecc_curve_get_params (i);
-
-      printf ("Testing %s (%d)\n", gnutls_ecc_curve_get_name (i), i);
-
-      if (mpz_set_str (A, (char *) st->A, 16) != 0)
-        {
-          fprintf (stderr, "XXX %d\n", __LINE__);
-          err = -1;
-          goto done;
-        }
-
-      if (mpz_set_str (modulus, (char *) st->prime, 16) != 0)
-        {
-          fprintf (stderr, "XXX %d\n", __LINE__);
-          err = -1;
-          goto done;
-        }
-
-      if (mpz_set_str (order, (char *) st->order, 16) != 0)
-        {
-          fprintf (stderr, "XXX %d\n", __LINE__);
-          err = -1;
-          goto done;
-        }
-
-      /* is prime actually prime? */
-      if ((err = mpz_probab_prime_p (modulus, PRIME_CHECK_PARAM)) <= 0)
-        {
-          fprintf (stderr, "XXX %d\n", __LINE__);
-          err = -1;
-          goto done;
-        }
-
-      if ((err = mpz_probab_prime_p (order, PRIME_CHECK_PARAM)) <= 0)
-        {
-          fprintf (stderr, "XXX %d\n", __LINE__);
-          err = -1;
-          goto done;
-        }
-
-      if (mpz_set_str (G->x, (char *) st->Gx, 16) != 0)
-        {
-          fprintf (stderr, "XXX %d\n", __LINE__);
-          err = -1;
-          goto done;
-        }
-
-      if (mpz_set_str (G->y, (char *) st->Gy, 16) != 0)
-        {
-          fprintf (stderr, "XXX %d\n", __LINE__);
-          err = -1;
-          goto done;
-        }
-      mpz_set_ui (G->z, 1);
-
-      /* then we should have G == (order + 1)G */
-      mpz_add_ui (order, order, 1);
-      if ((err = ecc_mulmod (order, G, GG, A, modulus, 1)) != 0)
-        {
-          goto done;
-        }
-
-      if (mpz_cmp (G->y, GG->y) != 0)
-        {
-          fprintf (stderr, "XXX %d\n", __LINE__);
-          err = -1;
-          goto done;
-        }
-
-      if (mpz_cmp (G->x, GG->x) != 0)
-        {
-          fprintf (stderr, "XXX %d\n", __LINE__);
-          err = -1;
-          goto done;
-        }
-
-    }
-  err = 0;
-done:
-  ecc_del_point (GG);
-  ecc_del_point (G);
-  mp_clear_multi (&order, &modulus, &A, NULL);
-  return err;
-}
-
-/* $Source: /cvs/libtom/libtomcrypt/src/pk/ecc/ecc_test.c,v $ */
-/* $Revision: 1.12 $ */
-/* $Date: 2007/05/12 14:32:35 $ */
diff --git a/lib/nettle/ecc_verify_hash.c b/lib/nettle/ecc_verify_hash.c
index 62efae0..3c5a1e5 100644
--- a/lib/nettle/ecc_verify_hash.c
+++ b/lib/nettle/ecc_verify_hash.c
@@ -1,19 +1,29 @@
-/* LibTomCrypt, modular cryptographic library -- Tom St Denis
+/*
+ * Copyright (C) 2011 Free Software Foundation, Inc.
  *
- * LibTomCrypt is a library that provides various cryptographic
- * algorithms in a highly modular and flexible manner.
+ * This file is part of GNUTLS.
  *
- * The library is free for all purposes without any express
- * guarantee it works.
+ * The GNUTLS library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public License
+ * as published by the Free Software Foundation; either version 2.1 of
+ * the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301,
+ * USA
  *
- * Tom St Denis, address@hidden, http://libtom.org
  */
 
-/* Implements ECC over Z/pZ for curve y^2 = x^3 + ax + b
- *
- * All curves taken from NIST recommendation paper of July 1999
- * Available at http://csrc.nist.gov/cryptval/dss.htm
+/* Based on public domain code of LibTomCrypt by Tom St Denis.
+ * Adapted to gmp and nettle by Nikos Mavrogiannopoulos.
  */
+
 #include "ecc.h"
 
 /**


hooks/post-receive
-- 
GNU gnutls



reply via email to

[Prev in Thread] Current Thread [Next in Thread]