Best of Data Science — November 2023
- 1
- 2
Towards Data Science·2y
System Design Cheatsheets: ElasticSearch
ElasticSearch is a powerful search engine that excels at performing full-text searches on large datasets. It can be used as a secondary database for full-text search operations, a real-time data analysis pipeline, or a recommendations system. However, ElasticSearch is not suitable for ACID compliance, complex joins, or small datasets with simple query needs. When using ElasticSearch in system design, consider its distributed architecture, scalability, document-based data modeling, real-time data analysis capabilities, and cost implications.
- 3
- 4
LambdaTest·3y
A Complete Guide to Web Scraping with Python
Web scraping is a powerful tool for collecting data from websites. Python is a popular programming language for web scraping, with libraries like BeautifulSoup and Selenium making the process easier. Web scraping can be used for various purposes, such as competitor analysis, lead generation, and data analysis. It's important to be aware of the legal and ethical considerations of web scraping and to comply with a website's terms of service.
- 5
AI in Plain English·3y
How I Deployed a Machine Learning Model for the First Time
The article discusses the process of deploying a machine learning model for the first time. It starts with an introduction to the Kaggle competition and the wine quality dataset. The author then performs exploratory data analysis and preprocessing on the dataset, including feature engineering, transforming distributions, standard scaling, and clustering. The pipeline is created using Scikit-learn's Pipeline class, and the best-performing model, CatBoostClassifier, is fine-tuned and added to the pipeline. The final step involves building a Streamlit app on Hugging Face to host the model. The article concludes with the author's reflections on the journey and encourages others to explore machine learning deployment.
- 6
Code Like A Girl·3y
SQL’s Order of Execution
Understanding the order of execution in SQL queries is crucial for optimizing performance and obtaining accurate results. The logical order of execution includes clauses like FROM, WHERE, GROUP BY, SELECT, and ORDER BY, while the physical order can be influenced by indexing, query optimization, and storage structures. Subqueries can be integrated into the SQL execution flow and impact query processing. Different databases may have variations in the execution order due to optimization strategies. When writing efficient SQL queries, consider factors like indexing, limiting the use of SELECT *, and optimizing JOIN operations. The database engine can change the order of execution through query optimization, utilizing techniques like cost-based optimization, index usage, and parallel processing. Troubleshooting issues related to the execution order involves understanding the logical flow, examining the query execution plan, checking index usage, profiling the query, and optimizing conditions in the WHERE clause.
- 7