MPE-NOVA data flow face-to-face meeting 11+12jul2022
MPE+NOVA data flow teams face-to-face meeting 11+12 July 2022
Participants: Yixian, Erich Wiezorrek, Gijs, Yves (part), Eckhard (part), Willem-Jan (part and only remote). Ric is not participating live, but shall be kept in loop about results and proposed decisions and Ric has final say as data flow team lead.
Objectives of the visit
By end of meeting we have accomplished the following:
-
Relinquish the Past: the hand-over of WP Imaging from NOVA to MPE is completed. The MPE WP Imaging team shall have no more dependence on NOVA team and no need for NOVA input to proceed with their work.
-
Conduct the Present: there is a procedure in place how WP Imaging keeps MicadoWISE team informed about updates to imaging pipeline (e.g., changes in data items, workflows, recipe inputs/outputs) so that the MicadoWISE team can update BasicMicadoWISE to continue supporting the WP Imaging team with:
a. a web-based queryable data archive with raw and processed simulated and lab data for the 3 imaging modes
b. a JupyterHub python environment in which the ESO imaging recipes can be embedded for interactive scientific data quality assessment.
-
Open the Future: the NOVA MicadoWISE team has gathered MPE's list of desirable functionalities for an interactive scientific data quality analysis system that complements the EDPS data processing system. The duty of that complementary system is to support the MICADO instrument team through commissioning, shake down phase of ELT and MICADO and then the early science harvest. This list of desirable functionalities will be the basis for the NOVA MicadoWISE team to seek funding opportunities (proposal submission anticipated not to happen before 1jan 2024) for building a system that has those functionalities. Starting the development from the current basic version of MicadoWISE.
Agenda
Mon morning:
- Discuss already planned workflow / data model changes
- In particular: Persistence, Non-linearity, Data item namings (proposal will be shared by Hugo and Gijs before the meeting)
- See https://gitlab.astro-wise.org/micado/datareductionlibrarydesigndrs/-/merge_requests/121 and https://gitlab.astro-wise.org/micado/datareductionlibrarydesigndrs/-/jobs/49620/artifacts/raw/ELT-SPE-MCD-56305-0019_DataReductionLibraryDesignDRS.pdf
- Discussion on how to share design and how to propagate design changes.
- Guided tour by NOVA through current Digital Design and its proposed maintenance plan as described in post-FDR memo
- How far can we get with existing (semi-)deliveries, e.g. IRDB + DID + EDPS + C-code + ICS-code?
- Which data products are used in the pipeline? That is, which data items exist, what FITS extensions do they have, what FITS headers keywords do they have? This is not described in any ESO delivery, but essential for MicadoWISE because this corresponds to database tables and columns, so we need to define this ourselves.
- Decide a collaborative plan on who is responsible for what in Digital Design (e.g., includes IRDB + DID + EDPS) and how to propagate information. That is, which location is the master-copy of each bit of information, and how/when/by who is this copied to the rest?
- In particular: how do we coordinate large workflow / data model updates; e.g. separate branches, separate databases?
Mon afternoon:
- Tackle remaining topics mentioned in RIXs, that are not yet discussed before
- Check per AI for a RIX that how to address it is now covered: change assignee to MPE person (resolved RIX-AIs from Dec2021, RIX-AIs with deadline 01Nov2022, RIX-AIs with deadline MidTermReview)
- Hand-over NOVA docs in MICADO data flow documentation portal from NOVA to MPE:
Tue morning (with Willem-Jan remotely):
- Guided tour by NOVA through MicadoWISE imaging data archive (DBview).
- Delivery by NOVA to MPE of a MicadoWISE JupyterNotebook running ScopeSIM to create raw darks & flats that are then processed using Yixian's masterdark and masterflat ESO recipes using the Groningen compute cluster and the MicadoWISE imaging archive for its input and output. This JupyterNotebook will continue to be technically supported by MicadoWISE team.
Tue afternoon:
- 13:30 Meeting with Yves
- Tick off open FDR AI for RIXes:
- Discuss EDPS-rules
- Discuss 'data item dictionary'
- Discuss granularity of recipes (see memo explaining the choice for fine-grained recipes in the FDR pipeline)
- if time permits: depersist strategy, the merging with PSF-R software, and the regression tests
- 16:00: Meet with Eckhard: debrief on meeting results.
- 16:30 Contingency (but Gijs has Euclid project videocon until 18:00)
Conclusions
Below are the agreements made between the NOVA team (Hugo and Gijs), the MPE team (Yixian), and to some extent ESO (Yves). This list primarily includes the agreements between the NOVA Imaging team and the MPE Imaging team. Agreements internal to NOVA or MPE, or between ESO (Yves) and either team, are only included where relevant. The agreements are ordered based on the order of the "goals of the meeting" in the agenda.
- The past: Handover to MPE of the documents (DRLD, CP, DRLVT).
- For all RIXes there is a plan to get them resolved, and discussed with Yves when necessary. Some are still to be discussed with the spectroscopy team; agreement is expected. 3 RIXes are handed over to YC, to be resolved at a later stage:
- https://jira.eso.org/browse/MIC-2131 Image Analysis Library or DPS-ICS libraries
- https://jira.eso.org/browse/MIC-2200 The trigger of the pipeline recipe is a template, not an OB.
- https://jira.eso.org/browse/MIC-2138 The detailed description of the tests will be developed in future updates of the document
- The NOVA team will ensure all other RIXes will get the status of “Resolved” before the handover. (Hugo will do these (TBC), Gijs will do rest.)
- Still to be discussed with spectroscopy is the naming of the data items / CPL recipes / DRL functions, and the shared parts of our pipeline. We expect that to go well.
- The documents will be updated to make them useful as a reference document for the MPE team. In particular, this means that
- The DRLD will be updated to reflect the changes we agreed upon (see below).
- E.g. the naming conventions are explained; this is more important than actually using the correct names.
- It will be indicated in the text where it is already anticipated that MPE might adapt the design, for example the PRO.CATG keywords.
- There is no need to handover the scripts for the automatic generation of the information in the DRLD. The MPE team can still use them if they want.
- MPE will first focus on the User Manual, which has to be delivered before the next version of the DRLD, and will then later update the DRLD based on the User Manual.
- For all RIXes there is a plan to get them resolved, and discussed with Yves when necessary. Some are still to be discussed with the spectroscopy team; agreement is expected. 3 RIXes are handed over to YC, to be resolved at a later stage:
- The present: Propagating pipeline changes back from the MPE team to the NOVA team, as needed for BasicMicadoWISE (the archive and distributed processing):
- The desire of the NOVA team is to have a machine readable “Single Source of Truth” for the various pieces of information that is shared between the teams. (Different information can have a different source though.) This will make it possible to
- at least automatically verify that BasicMicadoWISE uses the correct truth, in case the information has to be copied by hand. Or even better,
- If possible, directly use the correct information automatically; e.g. through code generation.
- We determined that almost all information needed for BasicMicadoWISE can be derived from ESO-deliveries, although the details on how to best do that need to be investigated:
- EDPS rules: the NOVA team anticipates that they can parse/use EDPS rules to get
- a list of data items that exist,
- the workflow orchestration, and
- some of the header keywords,
- CPL recipes: from the documentation, or otherwise from calling esorex, it will be possible to infer:
- which recipes exists,
- what process parameters they have,
- what input / output they have, which should overlap with the EDPS rules
- QC parameters list (ESO-ELT-DIC.MICADO_QC), for which
- the "Context" property of each parameter will be used to indicate which data items have that QC parameter, and
- we need to decide where this file will be stored and shared (currently in MicadoWISE git repository); perhaps it can be included with the pipeline code or with the IRDB.
- EDPS rules: the NOVA team anticipates that they can parse/use EDPS rules to get
- To remain feasible a minimal amount of FITS header keywords & values will be stored in the queryable database of the archive:
- Keywords from the template descriptions, which is a small enough set to share manually if necessary (see below).
- Keywords from the IRDB (that is, hardware description), which will be agreed upon between the NOVA team and the simulator team.
- Keywords relevant for constructing the workflow, which can be derived from the EDPS rules.
- QC parameters; from ESO-ELT-DIC.MICADO_QC.
- Process parameters; from the CPL recipes.
- Provenance; from the EDPS rules and/or the CPL recipes.
- Any other header keywords can optionally be included; but are not essential.
- The Instrument Reference Database (IRDB) will also provide relevant information, of which the Simulator team is the custodian, because it is also needed for ScopeSIM.
- The NOVA team will rely on the IRDB. How information from the MPE team is shared with the Simulator team (if any) is between them.
- Of particular interest is information about templates, for which we do not yet have a (new) single source of truth location.
- The template information is small enough to be shared informally as a fallback if necessary.
- ScopeSIM has the desire to process full templates (currently it can do only single exposures); that would require the template information to be in the IRDB, MicadoWISE would then automatically be able to use it.
- There will (at some point) be a machine readable representation of the templates by the ICS team, so using that might also be a possibility.
- The current digital design files (e.g. the VO-DML representations, Python files, yaml files etc in the MicadoWISE gitlab repository) will not be used by the MPE team, with the exception of ESO-ELT-DIC.MICADO_QC. These files will therefore be maintained by the NOVA team, or discarded if they are not needed anymore (their primary goal was to aid the design at a time when there was no other representation of that information).
- The desire of the NOVA team is to have a machine readable “Single Source of Truth” for the various pieces of information that is shared between the teams. (Different information can have a different source though.) This will make it possible to
- The future: The future of (Basic)MicadoWISE (the archive and distributed processing).
- In the short term (~the next year):
- The archive can be used to create and/or store simulated data. For this, the NOVA team will ensure that a) data simulated directly through from ScopeSIM can be ingested, and b) data simulated through MicadoWISE will be equivalent to data simulated through ScopeSIM directly.
- The development and testing of the recipes can be done locally by MPE and does not require MicadoWISE. It is therefore not imminent for MicadoWISE to support running of the latest CPL recipes, or vice-versa to ensure the CPL recipes are always runnable by MicadoWISE. This will allow the teams to work at their own pace; e.g. it is not directly problematic if the EPDS rules and CPL recipes are out of sync, with each other or with MicadoWISE.
- Nevertheless, the NOVA team will frequently attempt integration of the CPL recipes into BasicMicadoWISE, for example when the MPE team makes an internal development release. This will ensure we detect integration problems early and will strengthen our collaboration.
- Medium term (~after a year):
- The validation of the (already tested) recipes would benefit from the archive and distributed processing; especially for the critical algorithms (mainly for astrometric imaging). For example MicadoWISE would make it easy to run the recipes on large amounts of observations with a range of observing conditions to test robustness of recipes against "crappy" observation quality.
- The recipes and workflow should be relatively stable at that point, so fully supporting them in MicadoWISE should then be feasible.
- Long term (~commissioning and beyond):
- The prime reason to include the long term perspective in the meeting is because it provided a framework for points 2 and (the rest of) 3 above. We agreed that it would be useful to be able to store the real data in the archive, and did not discuss it much further.
- The NOVA team will attempt to also include lab data in the archive where this would be useful.
- The NOVA and MPE teams will work out further plans for the (Full)MicadoWISE Consortium Information System, including funding proposals.
- In the short term (~the next year):
We also made some agreements with respect to the design itself. Some of them were informational only, some require changes that will be incorporated by NOVA in the DRLD before handover, and some are changes that will be made by MPE.
- Informational:
- Flatfielding: The flatfielding in the current design is split in two (multiplicative) corrections “MasterFlat” and “Illumination Correction”. The split is designed this way, because the sky Background subtraction (an additive effect) should be applied after the MasterFlat, but before the Illumination Correction (at least in the FDR version of the design). This lead to the following design:
- The MasterFlat corrects the pixel-to-pixel differences of the quantum efficiency, assuming a fully uniform calibration lamp. This MasterFlat is created by median stacking without any smoothing, and then normalizing. Applying the MasterFlat will not actually lead to a ‘flat’ image because the illumination pattern of the calibration lamp is not actually uniform.
- The Illumination Correction corrects for the incorrect assumption of a uniform illumination. The assumption at FDR was that the illumination pattern is smooth and can be fully parameterized. The Illumination Correction is therefore not an image but a table (containing the coefficients of this parametrization).
- Distortions: The design of the distortion correction at FDR is twofold:
- The DistortionELT and DistortionWAM will allow a low-order correction of the distortion. The DistortionELT and DistortionWAM are both derived by the calibration team and provided as a static calibration data product. (During the meeting, we assumed that the DistortionWAM is created by the user. However, in the FDR design, this is not the case, see the Calibration Plan.)
- The workflow for Astrometric Imaging includes a higher-order correction that is derived from the science data itself.
- Flatfielding: The flatfielding in the current design is split in two (multiplicative) corrections “MasterFlat” and “Illumination Correction”. The split is designed this way, because the sky Background subtraction (an additive effect) should be applied after the MasterFlat, but before the Illumination Correction (at least in the FDR version of the design). This lead to the following design:
- Before handover:
- The persistence correction will not require a separate recipe anymore. This will make the imaging pipeline equivalent to the spectroscopy pipeline.
- The nonlinearity correction will be the first correction in the workflow.
- The naming convention will be clarified, in particular of the data items and DRL functions. The actual names will be updated on a best-effort basis, because the actual data items and DRL functions will depend on implementation choices from MPE.
- The PRO.CATG values will be kept as-is, because they are an integral part of the current pipeline design and changing them would require a full overhaul of the DRLD which is undesirable.
- The granularity of the recipes will be kept as is. In particular, recipes will run on single exposures whenever feasible. (Still triggered by a full template though.)
- The creation of the DistortionELT (and DistortionWAM?) data product will be removed from the Astrometric Imaging association matrix, because it is produced by the calibrations team and not by individual scientists.
- After handover:
- The derivation of the nonlinearity will use the det_mon software from ESO.
- The PRO.CATG (and therefore DO.CATG) values will be changed to the liking of MPE, probably to be identical to the name of the data items.
- The MPE team will decide whether to update the OCA rules to reflect the new PRO.CATG values or whether to directly create EDPS rules.
- The granularity of the recipes will be discussed further between MPE and ESO.
- MPE will create smaller utility recipes where useful, or at least structure the processing in DRL functions such that it would be easy to create such smaller utility recipes (e.g. by the NOVA team).
- MPE will consider adding association matrices for data products produced by the calibration team, e.g. Nonlinearity, Persistence, DistortionELT, etc.
Handy links
- GijsToSelf: mailthreadYixian, mailthreadYves, RIXs