Image by Author
Introduction
Keeping up with the fast-paced world of data science can often feel overwhelming. Every day, a plethora of new libraries, research papers, datasets, and tools emerge, making it challenging to stay on top. Simply following newsletters or social media threads often proves inadequate. Instead, I’ve found that maintaining a few well-chosen resources can serve as a central hub for my research, coding, and data analysis needs. This collection of bookmarks has become my daily compass—guiding me through the complexities of data science. Here, I’ll share my top 10 bookmarks that I rely on for inspiration, learning, and efficiency.
1. arXiv: Machine Learning (cs.LG) New Papers
One of my go-to resources for cutting-edge machine learning research is arXiv. Specifically, the cs.LG (Computer Science – Learning) section. This platform serves as a treasure trove of the latest research covering everything from theoretical foundations to practical applications in natural language processing (NLP), computer vision, and reinforcement learning (RL). By bookmarking this site, I ensure that I can frequently check for new papers that might spark ideas or inform my projects—staying ahead of trends and methods before they become mainstream.
2. GitHub Trending Python Repos
Staying updated on trending projects is crucial for any data scientist. The GitHub Trending Python Repos page showcases popular Python repositories on a weekly basis, introducing me to innovative libraries and experimental tools. Data science extends beyond mere algorithms; it encompasses the tools we utilize. A quick 10-minute scan of this page often reveals valuable resources, sparking new experiments in my work. I recommend making this a regular weekly habit to keep your toolkit fresh.
3. Data Is Plural
For those on the lookout for unique and interesting datasets, Data Is Plural is a fantastic resource. This newsletter and archive curates datasets that can serve as the cornerstone for project ideas and hackathon challenges. Each entry includes a brief description and a link, allowing for quick access. This resource expands beyond the usual sources like Kaggle, introducing data that you may not find elsewhere and helping to fuel creative data explorations.
4. The Rundown AI
Keeping track of the latest developments in AI and machine learning can be a time-consuming task. This is where The Rundown AI comes in. By aggregating the top news articles and research papers in the field, it saves me significant time usually spent on searching for relevant updates. The concise overviews help me stay informed about new tools, methodologies, and breakthroughs, ensuring I’m always in the loop.
5. RAWGraphs
Visualization is a key aspect of data science, and RAWGraphs is a user-friendly tool that simplifies this process. As a free, browser-based platform, it allows me to create clean, customizable visualizations directly from CSV or JSON files without delving into complicated code. This not only saves time but also helps in effectively presenting findings by exporting professional-looking charts in vector formats—ideal for reports and presentations.
6. Quartz Bad Data Guide
Data cleansing is often one of the most tedious parts of data science. That’s why I turn to the Quartz Bad Data Guide when tackling messy datasets. This guide outlines common issues such as missing values, garbled text, inconsistent formatting, and misentered data, offering practical solutions and strategies for remediation. Its well-structured approach by categorizing who should fix specific issues saves me valuable time and makes troubleshooting more systematic.
7. Five Minute Stats
When I need a quick refresher on statistical concepts, Five Minute Stats is my go-to reference. This resource compiles essential statistics concepts and formulas, allowing me to brush up on topics like hypothesis testing, probability distributions, and correlations in just a few minutes. It’s incredibly useful when checking calculations or preparing lessons, making sure I never have to waste time sifting through textbooks.
8. Awesome Data Analysis
The Awesome Data Analysis GitHub repository is another cornerstone of my research toolkit. This collection encompasses a wide range of tools and resources for data cleaning, manipulation, visualization, and even building machine learning pipelines. By bookmarking this site, I ensure that I have quick access to reliable resources, whether I’m refreshing my skills or sharing tools with colleagues and students.
9. Mockaroo
Creating realistic datasets for testing can be a daunting task, but Mockaroo makes it easy to generate random data and mock APIs. This tool allows me to create datasets in various formats like CSV, JSON, SQL, or Excel, saving me the hassle of manual entry. It’s especially useful for testing code, dashboards, or machine learning workflows, as I can create edge cases that mimic real-world scenarios without tedious manual work.
10. Foorilla
Finally, for job seekers in the tech and data sector, Foorilla is an invaluable platform. It aggregates job listings, including tech positions, allowing me to browse opportunities tailored to my interests. I can easily follow companies, filter jobs by location or remote options, and even export job lists in CSV or JSON format to keep meticulous records. This resource simplifies job hunting and ensures I’m always aware of new opportunities in the market.
Kanwal Mehreen is a machine learning engineer and a technical writer with a profound passion for data science and the intersection of AI with medicine. She co-authored the ebook “Maximizing Productivity with ChatGPT” and is recognized as a Google Generation Scholar 2022 for APAC. An advocate for diversity in STEM, she founded FEMCodes and champions academic excellence. She’s also a Teradata Diversity in Tech Scholar, Mitacs Globalink Research Scholar, and Harvard WeCode Scholar.
Inspired by: Source

