Sarah Alnegheimish’s research beautifully blends the realms of machine learning and systems engineering. Her goal? To enhance the accessibility, transparency, and trustworthiness of machine learning systems.
Currently a PhD student in Principal Research Scientist Kalyan Veeramachaneni’s Data-to-AI group at MIT’s Laboratory for Information and Decision Systems (LIDS), Alnegheimish passionately develops Orion—an open-source, user-friendly machine learning framework and time series library designed to detect anomalies without supervision in extensive industrial and operational contexts.
Early Influence
Growing up in a household where education was highly prized—a university professor for a father and a teacher educator for a mother—Alnegheimish quickly grasped that knowledge should be shared broadly. “Experiencing this environment instilled in me the desire to make machine learning tools more accessible,” she reflects. Her firsthand experience with open-source resources only deepened this commitment. “Accessibility is crucial for technology adoption. The essence of open-source development is ensuring that new technology is reachable and evaluable by those who truly need it.”
Alnegheimish earned her bachelor’s degree from King Saud University (KSU) as part of the pioneering cohort of computer science majors. “Before this program launched, the only computing major available was IT,” she recalls. Being part of the inaugural class was thrilling, albeit challenging. “With all faculty teaching unfamiliar material, I had to engage in substantial self-learning. That’s when I first encountered MIT OpenCourseWare, which served as an invaluable resource for me.”
Shortly after her graduation, Alnegheimish joined King Abdulaziz City for Science and Technology (KACST), a national lab in Saudi Arabia. Collaborating with Veeramachaneni at the Center for Complex Engineering Systems (CCES), she was drawn to MIT for her graduate studies, with his research group as her top preference.
Creating Orion
Alnegheimish’s master’s thesis concentrated on time series anomaly detection—spotting unexpected behaviors in data that can yield vital insights. For instance, unusual patterns in network traffic can indicate cybersecurity threats, abnormal sensor readings in machinery can forewarn potential failures, and monitoring patient vital signs can avert health complications. It was during this research that she embarked on the design of Orion.
Orion employs both statistical and machine learning models that are consistently logged and maintained, allowing users without machine learning expertise to utilize the code. They can analyze signals, compare anomaly detection techniques, and scrutinize anomalies through an integrated program. Importantly, the framework, code, and datasets are all open-sourced.
“Open-source fosters accessibility and transparency. Users can navigate the code to understand how the model operates, which significantly boosts transparency,” Alnegheimish explains. She believes this transparency cultivates trust in the model, allowing users to witness its reliability firsthand.
“We aim to consolidate various machine learning algorithms into one platform, enabling anyone to use our models directly,” she states. “It’s not just for our MIT sponsors; everyday users install it from our library and apply it to their data. It’s proving to be a valuable resource for accessing the latest anomaly detection methods.”
Repurposing Models for Anomaly Detection
In her PhD journey, Alnegheimish is further innovating in anomaly detection using Orion. “Initially, all machine learning models required training on new datasets. Now, we can utilize pre-trained models,” she explains. This shift streamlines processes, saving time and computational resources. However, many pre-trained models aren’t designed specifically for anomaly detection. “These models were primarily intended for forecasting, not anomaly identification,” she notes. “We are redefining their potential with prompt-engineering, without additional training.”
Alnegheimish posits that pre-trained models, having already absorbed time-series data patterns, may inherently possess the capacity to detect anomalies. While they currently do not outshine models trained on specific datasets, she holds optimistic views for their future potential.
Accessible Design
Alnegheimish has dedicated substantial efforts towards enhancing Orion’s accessibility. “Before I came to MIT, I believed that the primary focus of research was solely on model development. Over time, I learned that making your research accessible and applicable for others requires creating systems that enhance usability,” she explains. Throughout her graduate studies, her strategy has involved parallel development of models and systems.
A pivotal aspect of her system development involves establishing the right abstractions to integrate with her models. These abstractions offer universal representations with streamlined components. “Every model follows a sequence of steps from raw input to desired output. By standardizing input and output, we’ve allowed for flexibility in the intermediary processes. Thus far, every model we’ve tested has successfully fitted into our abstractions,” she asserts. “These abstractions have proven to be stable and dependable for the past six years.”
Alnegheimish’s ability to simultaneously develop systems and models shines through in her mentorship. While working with two master’s students, she provided them with the system and relevant documentation. “Both successfully built their models using our abstractions, confirming we’re on the right track,” she enthuses.
She has also explored the potential of large language models (LLMs) as intermediaries between users and systems. The LLM agent she developed interfaces with Orion, enabling users to interact with the system without needing in-depth knowledge. “Consider ChatGPT. You don’t need to understand its underlying mechanics; it’s accessible to everyone,” she illustrates. In her software, users need to know only two commands: Fit, to train their model, and Detect, to identify anomalies.
“My ultimate aspiration is to democratize AI,” she shares. Orion has already surpassed 120,000 downloads, with more than a thousand users favoriting the repository on GitHub. “Traditionally, research impact was measured by citations and publications. Now, we have real-time evidence through open-source adoption.”
Photo credit & article inspired by: Massachusetts Institute of Technology