This is a reflection on past two internships.

Software Engineering @ Interactive Brokers, CT

  1. Price Retriever: A SpringMVC example to get price from Oracle database given contract id and date. No UI/frontend; obtain data in JSON in terminal;
  2. Kafka Health Visualizer(KHV): 4 genres of Grafana dashboards to monitor 532 Kafka cluster health metrics via Prometheus

Additional information about 2:

  1. JMX exporter and Kafka exporter are used to export these metrics;
  2. 532 metrics are divided into 3 categories: ~464 Kafka broker metrics, ~50 producer metrics, ~30 consumer metrics;
  3. Demo producers, consumers, broker clusters are deployed in bash;
  4. Metrics are exported to a Prometheus server;
  5. Grafana are dashboards to visualize the value of these metrics in Prometheus.

KHV is a part of CAT(Consolidated Audit Trail), an automatic internal audit system requested by FINRA and SEC.

My part was estimated to save engineers around 450 hours.

Quant Dev @ Greenwich St. Advisors, NY

Dataset Exploder: A SQL snippet to explode daily data of 20k records to 0.17 billion intraday records. This is a fake dataset for quant big data analysis. They will use this table as a practice to test their tools. Once perfect, it can be easily transplanted to real incoming big data table.

The next pivot was an automated testing framework. There are ~20 tests given, all of which had many lines. My task was to rewrite, or more precisely, wrap all these tests into one-line tests. This one line would just invoke a run_query() method with its parameters and it will perform in exactly the same way. Tests are unit tests, but it can be done via pytest.

Tests are to test different modes as well. Specifically, 4 modes: update_sql, sql, update_data, data. Clearly, 2 kinds of tests on SQL and data. For data, database is needed.

Task: make all tests pass; perform different kinds of tests on write or read sql or data via cmd.

  1. Containerization: SQL Server is used. It’s running inside a docker container.
  2. Chinook database is used. For convenient db operations, Azure Data Studio was installed. It’s not fit for large amount of data import, though.
  3. Cmd SQL Server was used for Chinook tables import.
  4. TestController, TestMode was created for testing purpose.
  5. 4 kinds of operations, write_sql, read_sql, write_data, read_data were created to write/read sql/csv files for assert actual == expected.
  6. dataframes can be directly written to files; they don’t have to be converted into strings, etc.
  7. cmd option and parser are used for cmd input selecting testing modes:pytest cmd option

This internship mainly built low level pyLib infra for app level quant data analysis. It was kind of harder than IB internship because it didn’t use frameworks like SpringMVC, nor did it use widely-used tools such as Apache Kafka. But it was a great experience. Pytest, docker, parser, delicate business logics.

Ausgezeichnet.