Best of Big Data2022

  1. 1
    Article
    Avatar of mercariMercari Engineering·4y

    Building a simple search API

    This article is for day 5 of Merpay Tech Openness Month 2021. The solution covered would allow for a search API to be easily used in a project. Data is loaded from BigQuery (the data source to search) into Apache Solr (the search engine) The search server runs on Cloud Run. Cloud Dataflow is used for search index generation.

  2. 2
    Article
    Avatar of communityCommunity Picks·4y

    The Guide to Modern Data Architecture

    This is an updated version of a post we originally published in 2020. We argue that core data processing systems have remained relatively stable over the past year, while supporting tools and applications have proliferated rapidly. We explore the hypothesis that platforms are beginning to emerge in the data ecosystem, and that this helps explain the particular patterns we’re seeing.

  3. 3
    Article
    Avatar of pointerPointer·4y

    The Guide to Modern Data Architecture

    This is an updated version of a post we originally published in 2020. We argue that core data processing systems have remained relatively stable over the past year, while supporting tools and applications have proliferated rapidly. We explore the hypothesis that platforms are beginning to emerge in the data ecosystem, and that this helps explain the particular patterns we’re seeing.

  4. 4
    Article
    Avatar of bytebytegoByteByteGo·3y

    EP36: Types of Databases and Use Cases

    Ilum is the Spark cluster manager and monitoring tool. Ilum provides an all-in-one solution for: Apache Spark Cluster management and monitoring service Hadoop. The core of ElasticSearch lies in the data structure and indexing. It is important to understand how ES builds the term dictionary using LSM Tree.

  5. 5
    Article
    Avatar of dzDZone·4y

    How the AI Behind TikTok Works

    TikTok is a video-sharing app that let users create and share short videos. It impresses users with its personalized "just for you’ recommendations precisely. Behind it, it is powered by artificial intelligence technologies. A range of machine learning and deep learning algorithms and techniques are applied to build models and generate recommendations.

  6. 6
    Article
    Avatar of honeypotHoneypot·4y

    The Best Programming Conferences to Attend in 2022

    Developers thrive by connecting with other programmers, who may then become business partners or facilitators when looking for a job. CNBC estimates that 80% of jobs are filled thanks to networking, instead of through a traditional job opening system. Programming conferences are places to connect with people who want to dive deep into specific know-how about a programming language.

  7. 7
    Article
    Avatar of tdsTowards Data Science·4y

    Anomaly Detection in SQL

    Anomaly Detection with Z-Scoring is a simple, unsupervised statistical technique for flagging outliers. Z-scoring is a powerful, minimalist anomaly detection model that can be implemented quickly and flexibly in the data warehouse. The model is based on a fictitious transaction-level table in Snowflake.