A comparison between InnoDB and WARP on two decades of the BTS ontime flight data, a 109 column wide table.
Hi. I’m Justin Swanhart, the CEO and founder of Leap DB. I’ve been working with databases and distributed systems for my entire twenty-two-year career.
Load WARP table (11 mins) File '/data/ontime/ontime.csv' (79.68 GB) was imported in 9 min 58.1429 sec at 133.22 MB/s Total rows affected in ontime.ontime_warp: Records: 157928010 Deleted: 0 Skipped: 0 Warnings: 0 Load InnoDB table (37.5 mins) File '/data/ontime/ontime.csv' (79.68 GB) was imported in 37 min 28.6070 sec at 35.44 MB/s Total rows affected in ontime.ontime_innodb: Records: 157928010 Deleted: 0 Skipped: 0 Warnings: 0Note that there is an index on FlightDate on ontime_innodb.
mysql> select count(*) from ontime_warp; +-----------+ | count(*) | +-----------+ | 157928010 | +-----------+ 1 row in set (42.19 sec) mysql> select count(*) from ontime_innodb; +-----------+ | count(*) | +-----------+ | 157928020 | +-----------+ 1 row in set (4 min 13.73 sec)
mysql> select Year, count(*) cnt from ontime_innodb group by 1; +------+---------+ | Year | cnt | +------+---------+ | 1987 | 1311827 | | 1992 | 5092166 | | 1988 | 5202096 | | 1989 | 5041200 | | 1990 | 5270893 | | 1991 | 5076925 | | 1993 | 5070501 | | 1994 | 5180048 | | 1995 | 5327435 | | 1996 | 5351983 | | 1997 | 5411843 | | 1998 | 5384721 | | 1999 | 5527884 | | 2000 | 5683047 | | 2001 | 3422767 | | 2002 | 4935434 | | 2003 | 6488540 | | 2004 | 7129270 | | 2005 | 7140596 | | 2006 | 7141922 | | 2007 | 7455458 | | 2008 | 7009726 | | 2009 | 6450285 | | 2010 | 6450117 | | 2011 | 6085281 | | 2012 | 6096762 | | 2013 | 6369482 | | 2014 | 5819811 | +------+---------+ 28 rows in set (8 min 7.4194 sec) > select Year, count(*) cnt from ontime group by 1; +------+---------+ | Year | cnt | +------+---------+ | 1987 | 1311826 | | 2014 | 5819811 | | 1988 | 5202096 | | 1989 | 5041200 | | 2005 | 7140596 | | 2006 | 7141922 | | 2007 | 7455458 | | 2008 | 7009726 | | 1990 | 5270893 | | 2009 | 6450285 | | 2010 | 6450117 | | 2011 | 6085281 | | 2012 | 6096762 | | 2013 | 6369482 | | 1991 | 5076925 | | 1992 | 5092157 | | 1993 | 5070501 | | 1994 | 5180048 | | 1995 | 5327435 | | 1996 | 5351983 | | 1997 | 5411843 | | 1998 | 5384721 | | 1999 | 5527884 | | 2000 | 5683047 | | 2001 | 3422767 | | 2002 | 4935434 | | 2003 | 6488540 | | 2004 | 7129270 | +------+---------+ 28 rows in set (1 min 9.2067 sec)
mysql> select count(*) cnt from ontime_innodb where Year = 1998; +---------+ | cnt | +---------+ | 5384721 | +---------+ 1 row in set (8 min 17.9652 sec) mysql> select count(*) cnt from ontime_warp where Year = 1998; +---------+ | cnt | +---------+ | 5384721 | +---------+ 1 row in set (8.1808 sec) mysql> select count(*) cnt from ontime_warp where Year = 1998; +---------+ | cnt | +---------+ | 5384721 | +---------+ 1 row in set (2.1808 sec)
mysql> select count(*) cnt from ontime_innodb where FlightDate between 19980101 and 19981231; +---------+ | cnt | +---------+ | 5384721 | +---------+ 1 row in set (8.1397 sec) > select count(*) cnt from ontime_innodb where FlightDate between 19980101 and 19981231; +---------+ | cnt | +---------+ | 5384721 | +---------+ 1 row in set (1.9824 sec) > select count(*) cnt from ontime_warp where FlightDate between 19980101 and 19981231; +---------+ | cnt | +---------+ | 5384721 | +---------+ 1 row in set (9.8641 sec) > select count(*) cnt from ontime_warp where FlightDate between 19980101 and 19981231; +---------+ | cnt | +---------+ | 5384721 | +---------+ 1 row in set (2.1297 sec)
mysql> select Year(FlightDate), count(*) cnt from ontime_innodb group by 1; +------------------+---------+ | Year(FlightDate) | cnt | +------------------+---------+ | 1987 | 1311827 | | 1988 | 5202096 | | 1989 | 5041200 | | 1990 | 5270893 | | 1991 | 5076925 | | 1992 | 5092166 | | 1993 | 5070501 | | 1994 | 5180048 | | 1995 | 5327435 | | 1996 | 5351983 | | 1997 | 5411843 | | 1998 | 5384721 | | 1999 | 5527884 | | 2000 | 5683047 | | 2001 | 3422767 | | 2002 | 4935434 | | 2003 | 6488540 | | 2004 | 7129270 | | 2005 | 7140596 | | 2006 | 7141922 | | 2007 | 7455458 | | 2008 | 7009726 | | 2009 | 6450285 | | 2010 | 6450117 | | 2011 | 6085281 | | 2012 | 6096762 | | 2013 | 6369482 | | 2014 | 5819811 | +------------------+---------+ 28 rows in set (3 min 38.2186 sec) > select Year(FlightDate), count(*) cnt from ontime group by 1; +------------------+---------+ | Year(FlightDate) | cnt | +------------------+---------+ | 1987 | 1311826 | | 2014 | 5819811 | | 1988 | 5202096 | | 1989 | 5041200 | | 2005 | 7140596 | | 2006 | 7141922 | | 2007 | 7455458 | | 2008 | 7009726 | | 1990 | 5270893 | | 2009 | 6450285 | | 2010 | 6450117 | | 2011 | 6085281 | | 2012 | 6096762 | | 2013 | 6369482 | | 1991 | 5076925 | | 1992 | 5092157 | | 1993 | 5070501 | | 1994 | 5180048 | | 1995 | 5327435 | | 1996 | 5351983 | | 1997 | 5411843 | | 1998 | 5384721 | | 1999 | 5527884 | | 2000 | 5683047 | | 2001 | 3422767 | | 2002 | 4935434 | | 2003 | 6488540 | | 2004 | 7129270 | +------------------+---------+ 28 rows in set (1 min 11.8889 sec)Note that the InnoDB index-only scan produced a sorted result, because the index is read in sorted order. The cost of the query on WARP is essentially the same as the query that used the Year column. A full column scan on WARP is 3x faster than a index-only scan on InnoDB. This is because WARP uses sequential IO instead of random IO when traversing the data.
mysql> select count(*) cnt from ontime_innodb where OriginAirportId = 10135 or DestAirportId = 10135; +--------+ | cnt | +--------+ | 275922 | +--------+ 1 row in set (8 min 10.1118 sec) mysql> select count(*) cnt from ontime_warp where OriginAirportId = 10135 or DestAirportId = 10135; +--------+ | cnt | +--------+ | 275922 | +--------+ 1 row in set (22.0516 sec) mysql> select count(*) cnt from ontime_warp where OriginAirportId = 10135 or DestAirportId = 10135; +--------+ | cnt | +--------+ | 275922 | +--------+ 1 row in set (1.0812 sec)