This HackMD is based on The Turing Way collaboration cafe template

A permanent document exists in the HackMD: https://hackmd.io/@environmental-ai/collaboration-cafe that is regularly updated with the empty template for next event.

The Environmental AI ⛰ 🌳 🏙️ ❄️ 🔥 🌊 online Collaboration Cafe

30 November 2021 | TBC

Thank you for joining the The Environmental AI’s online Collaboration Cafe!

We’re delighted to have you here ☕ ✨ 🍰

When? 30 November 2021, 14:00 - 16:00 UTC (see in your time zone)

Next call: 25 January 2022

What? The Environmental AI is a community aiming to learn and discuss about good research practices to use existing AI and data science solutions to a better understanding of the planet earth across multiple environmental settings. Collaboration Cafes are online coworking calls that engage anyone interested in learning and discussing about relevant themes in AI and data science to environmental studies.

Who? Everyone interested in reproducible, ethical, and inclusive data science and research for environmental studies are welcome to join the full or any part of The Environmental AI project, community, and/or this call.

How? Join Zoom Meeting https://turing-uk.zoom.us/j/6779579342?pwd=L25scnhXUUNmVjFsc0hRWTAzTVJ1dz09

==The waiting room is enabled. The host of this call will let you in.==

All questions, comments, and recommendations are welcome!

Sign up below

Name + Break ice question + an emoji to represent it (emoji cheatsheet)

(Remember that this is a public document. You can use a pseudonym if you’d prefer.) ==If you are new to HackMD, please see this document for a short guide (right click, open in a new window): https://hackmd.io/@turingway/hackmd-guide.==

Conversation Starters

Advertise and promote your event or anything exciting you’re working on.

Agenda

Schedule

Duration

Activity

Start

👋 Welcome, code of conduct review

10 mins

Introductions and personal goal setting

25 mins

🍅 1st Pomodoro session

5 mins

☕️ Break

20 mins

🍅 2nd Pomodoro session

5 mins

☕️ Break

20 mins

🍅 3rd Pomodoro session

5 mins

☕️ Break

30 mins

Open discussion: celebrations, reflections and future directions

5 mins

👋 Close

Breakout rooms: Topic proposals

If you have an idea for a topic you’d like to discuss in a breakout room, please add it below and put your name next to it. If you like one of the topics that are already suggested, please add your name next to that one. Teamwork makes the dream work. For more information about breakout rooms see the description on GitHub.

Topics for breakout / Names

Notes and questions

Request for reviews!

Feedback at the end of the call


Notes from the last call:

Archive: 26 October 2021 - Reproducibility in Environmental Science

Name + Share a song that expresses your personality + an emoji to represent it (emoji cheatsheet)

  • Alejandro + Should stay or Should I go (The Clash) + 🧳

  • Sam + Wish you were here (Pink Floyd)

  • Matt - BBC Grandstand Theme - :horse_racing:

Conversation Starters

Advertise and promote your event or anything exciting you’re working on.

Breakout rooms: Topic proposals

If you have an idea for a topic you’d like to discuss in a breakout room, please add it below and put your name next to it. If you like one of the topics that are already suggested, please add your name next to that one. Teamwork makes the dream work. For more information about breakout rooms see the description on GitHub.

Topics for breakout / Names

  • Matt, making a reproducible GitHub code for his MRes dissertation

  • Alejandro, preparing contributions guidelines for the Environmentel AI book

  • Sam J, exploration of resources for reproducibility and feedback on Matt and Alejandro’s topics

Notes and questions

  • Sam J:

    • The Turing Way, a great resource to guide Environmental scientist in reproducible research.

    • Cornell Dataset Description a good starting template for dataset documentation!

    • Standards in data catalogues, e.g. STAC (but it isn’t mature)

  • Alejandro:

    • Zenodo:

      • It is great to keep your sample data (up to 50 GB).

    • notebooksharing.space

      • A nice resource to share notebooks with interactive plotting (up to 10Mb). However, it doesn’t allow track changes as ReviewNB does.

    • Contributors guidelines for the EnvAI book

      • Sam suggests example environmental python packages with links to notebooks (e.g. hvplot, geopandas etc.)

      • Minimal publishable version guidelines e.g. Binder

      • Use external links for general versioning principles e.g. how to pull request in Github

      • Provide examples how to create lock environments

      • Section of tools for sharing notebooks e.g. ReviewNB, notebooksharing.space

  • Matt

    • Publishing reproducible code for environmental science

      • It can be more important that the process can be reproduced rather than accuracies to the nearest 0.01%

      • Use a subset of data to demonstrate the tool where the owners aren’t happy to share the whole thing - training & inference

      • In env science a visual demonstration of the results can be more useful than a commandline readout of accuracy

      • Suggest sensible ranges for hyperparameters in the documentation

Request for reviews!

  • Sam J: reviewers need for SEVIRI wildfire data notebook of the EnvAI book, see PR#12

Feedback at the end of the call

  • None

Archive: 28 September - Data preprocessing

Name + What’s the hardest part about working virtually for you? and the easiest? + an emoji to represent it (emoji cheatsheet)

  • Alejandro + social interaction, more sleep time + :busts_in_silhouette: :sleeping:

  • Sam A. + I still have just as many meetings if not more and it is soooo tiring! :sleeping: :pleading_face:

  • Evangeline + Feeling self-conscious on camera, flexibility + :movie_camera: :clock1:

Conversation Starters

Advertise and promote your event or anything exciting you’re working on.

Breakout rooms: Topic proposals

Topics for breakout / Names

  • Sam A. Manufacture Urban Data in GIS format

  • Evie. Preprocessing satellite data for crop yield prediction

  • Alejandro. Preprocess FluxNet data and related gridded products

Notes and questions

  • We showcased the SEPAL platform for Vegetation Satellite Image analysis.

  • Discussed challenges around scoping and extracting satellite data for machine learning models of vegetation (agricultural crops):

    • Appropriate satellite platform (Sentinel/LANDSAT?)

    • Preprocessing of radar and optical data (i.e. dealing with cloud cover)

    • Appropriate time series/critical dates for plant growth

  • Sam A. used ArcGIS pro to extract site-specific temperature information from a gridded netCDF dataset using the Spatial Analyst ‘Sample’ tool. It is very useful in that it works across the time dimension so I could do this for 1 year of data in one go. It is also possible to set a desired output coordinate system. I could save the data out as a csv file and then use standard python tools like pandas and numpy for further processing

  • Sam A. suggests using Iris package for reprojecting gridded netCDF files. The project is

  • Data preprocessing is still too time-consuming, and there is lack of communication of the tools available.

Request for reviews!

  • None

Feedback at the end of the call

  • None

Archive: 29 June - Data Visualization

Name + Something you watch (video, movie, documentary. etc) recently that was inspiring for you? + an emoji to represent it (emoji cheatsheet)

Conversation Starters

Advertise and promote your event or anything exciting you’re working on.

  • Alejandro: EGU Public call-for-session-proposals all other sessions: Deadline: 6 September 2021

  • Scott: Pangeo European Community is growing and there are plans of coffee chats and regular showcase meetings (see here)

Breakout rooms: Topic proposals

Topics for breakout / Names

  • Sam J: Regridding MODIS data for wildfires detection

  • Tom: Produce script to reproduce IceNet paper figures for Nature Communications

  • Emily: Visualization of LiDAR data

  • Scott: Organizing and Admin EnvSensors WPs project timetable

  • Alejandro: Deploying a FluxNet use case visualization outputs for the Environmental AI book

Notes and questions

  • Emily showed a cool visualization of a laser scan image (100 GB) using the propietary software of the scanner device. After data preprocessing, she will use libraries for visualizing individual trees.

    • Emily says there are also some radar sensors that collect soil data.

  • Tools for regridding MODIS data. Sam is using satpy. Suggestions of other existing tools are welcome.

  • Tom is making his code nicer i.e. modules and efficient i.e using dask.

  • Alejandro shows FluxNet demo

    • Emily suggest adding woodlands and shrubs to subset FluxNet data.

Feedback at the end of the call

  • Add a disclaimer collaboration cafes’ hackMDs are public.

  • Names for breakout rooms.

  • We should aim to keep to time, once we are used to the format etc.