This is the requirement:
1) This is the only table schema we need.
3) Any text field has no more than 10 chars.
5) SUPER_TABLE_1 won’t have more than 10 million rows and if it reaches the limit it will automatically clone the table, add a suffix and start inserting there (ex: SUPER_TABLE_2, SUPER_TABLE_3).
6) We should be able to query between any 2 dates but not between years.
7) The only query we need to do is:
SELECT * FROM THE_ONE_TABLE_1 WHERE colA BETWEEN $date1 and $date2
9) Additonal filtering will be done using Java or Node (which will be in the same network)
9) We transform that into a CSV and we send to the client.
What Big Data solution would you use? some technologies we have in mind are Cassandra, HBase, Couchbase, MySQL/MariaDB.
We’ll be using AWS and have budget of 5k a month.
Nice to have:
1) Would be nice to filter by any column but filtering like in step 4 it’s completely fine.
2) We could change step 5, having all data in a single table could be fine, maybe even creating multiple tables for the SUPER_TABLE_1 but everytime we query we need to get all columns (always query between 2 dates in order to keep it simple).
We get 100k rows of data everyday, date is the primary key, we should be able to filter by 8 possible columns (all strings, one is a date) using exact values (but filtering using Java/Node is acceptable) then pass that data as a CSV (JSON is fine but it may be too big) to another REST Service in the same network (using POST).
We’ll use AWS with a 5k budget limit, we can’t use other Cloud Services (Firebase for example).