Working curious and digging deep for 30 years, Rob has accumulated a vibrant experience. Proud of an ability to make technology work for people, to bring the miracle of the computer closer to the passion of human endeavor. He has been a lead for 25 of those years, expressively teaching, training, documenting and communicating solutions all while getting it done. He has worked with big companies in Banking, Retail, Fianancial Networks, Science, and small startups, and worked with a variety of languages and systems, from Java and Python, to Oracle and MongoDB. And everything in between.
Collecting data often requires custom access methods. In this case, every robot in every warehouse sends a health message, often. There are thousands of warehouses worldwide, and hundreds of robots in a warehouse, lots of chatter, leading to Big Data with a capital 'B'. The data was collected with a variety of custom methods, including http posts, direct polling of some intermediate data stores, and controllers putting the messages directly into Kinesis. The data extraction was standardized into small and simple messages pushed to a large Kinesis stream.
Raw data is rarely useful, it has to be transformed into something that is meaningful to people. From Kinesis, the stream of data is sent through Flink to be processed in parallel and metrics are aggregated over 5 minute windows per individual robot. Metrics include values such as cumulative uptime, current operational status, number of successful or failed operations. Summaries are also aggregated at warehouse levels.
Meaningful metrics should be available in real time, but also need to be stored so we know what is happening over time. In this case, the raw messages are stored to S3 where they can be queried (slowly) with Athena if desired. The calculated metrics are pushed to a Postgres relational database for real-time dashboards and reports.
The data that is meaningful right now is often context-dependent and can change. When a new problem emerges, a new view is suddenly needed. This is why flexible storage and flexible reporting is always a requirement. In this case, dashboards and real-time reports appear in Grafana, and ad-hoc queries were supported from Tableau.
Data is useful when it leads to action. For robotics, managers, trouble-shooters, and techs get the precise information they need and take action.
Scientific data often comes from lab machines. Sometimes the best thing to do with data extraction is to keep it as simple and flexible as possible. Don't use complicated, expensive, and inflexible big data tools for smaller data sets. In this case, we used the machine's export as Excel function to get the data out painlessly.
Excel is beautiful for small data sets (under 100000 rows) because it not only stores the data but allows transformation and view all within one paradigm that is easy to work with, understand, and extend by non-developers. We transformed the extracted data using an extended Excel Visual Basic macro to clean it up, remove poor quality signals, and standardize the columns.
The transformed data is left on a sheet that is clearly displayed to a user so they know precisely what is being graphed and displayed. Excel allows this kind of user-friendly transparency, unlike a database, where it is all hidden and obtuse. This is great in a scientific application where you have to be confident in exactly what you are graphing or displaying as results.
Excel has great facilities for displaying graphs. The cleaned up data was moved by extended Visual Basic macros to new sheets and transposed into tabular form for graphs. A heat map graph was generated. A user can create new graphs easily as needed, or we can add these as needed.
The results are easily viewed and easily added to documents and communications.
We are available for custom work, discussions, or advice. Contact us at info@ourunlimitedresources.com