Chapter 7: Jupyter Notebooks and SciServer
Last updated
Last updated
The Jupyter Notebooks access the database of metadata using an existing successful platform called SciServer. SciServer is built and supported by Johns Hopkins’ Institute for Data Intensive Engineering and Science (IDIES) that builds upon and extends the SkyServer system of server-side tools that introduced the astronomical community to SQL (Structured Query Language) and has been providing the Sloan Digital Sky Survey catalog data to the public. It is particularly appealing because, although it was originally designed to support astronomy research, it expanded to include several research and education tools that made access to hundreds of Terabytes of astronomical data easy and intuitive for researchers, students, and the public [9,10]. The current SciServer system has scaled out these tools for multi-science-domain support, applicable to any form of data, including oceanography, mechanical engineering, social sciences, and finance. In addition, SciServer features a learning environment that is being used in K-12 and university education in a variety of contexts, both formal and informal.
The team has developed a fully-fledged schema and data dictionary and developed Jupyter Notebooks that are accessible through SciServer. Users can first register an account at https: //apps.sciserver.org and then contact the team at sciserver-helpdesk@jhu.edu requesting access to Democratizing Data resources, indicating the reasons for the request and their SciServer username.
When access has been granted, example Jupyter notebooks will be available on an dataset-specific volume corresponding to the granted access. The expectation will be that the user is familiar with the use of Jupyter as an interface, has a basic understanding of scripting (typically in Python), and is comfortable with SQL data retrieval. For more information on how to get started and use SciServer, please visit the help pages at https://sciserver.org/support/.
With a SciServer account and the appropriate permissions, a user can query Democratizing Data databases via either the CasJobs interface (see https://www.sciserver.org/about/casjobs/) or via the CasJobs Python SDK (see https://www.sciserver.org/docs/sciscript-python/ SciServer.html#module-SciServer.CasJobs).
In addition to a master database (ShowUsTheData v3) which contains records for all the agencies processed, there are databases available for individual agencies (possibly containing multiple socalled “runs”) with names of the form “DemocratizingData {AGENCY NAME}”. These agency-level databases provide data in a manner closer to the API (see Chapter 8), e.g., validated data without licensed snippet information. Individual users may or may not see some or all of these depending on their access level. In general, users of the system (as opposed to administrators) would be given access to one or more DemocratizingData_{AGENCY_NAME} databases in a read-only manner per their individual requirements.. For more information on the database schema, please see Appendix A and Appendix B. For more information on querying databases within the SciServer environment, please see https://sciserver.org/support/help/#casjobs, or examples in notebooks and documentation available on democratizing data related volumes on SciServer.