Research Platform Services Wiki

If you're a researcher, we'll help you do stuff.

User Tools

Site Tools


Sidebar

data_management:mediaflux:vicnode_mf

Service Functionality

User Functionality

The current UoM service offering utilises some of Mediaflux' capabilities (functionality will grow over time). UoM currently offers a turn-key project capability:

  • Projects
    • A project namespace (think of it as a cloud-based file system) where you and your team can securely store and share your data collection
    • You can structure your project namespace however you wish
    • You can associate discoverable meta-data with the structure and with the files that you upload
  • Authentication
    • You can login with your institutional credentials or with a local account
      • For University of Melbourne staff and students, you can login directly with your institutional credential
      • For users from other Institutions that are members of the Australian Access Federation uni can login with your institutional credential via the AAF
    • For other users, you can log in with a local account created for you
  • Authorisation
    • Whatever account you login with, it must have roles granted to it (executed by the Mediaflux support team) to enable your account to be authorised to access resources
    • Standard roles per project are created (admin, read/create/destroy, read/create, read) which can be assigned to project team members
  • Access Protocols
    • Projects can be accessed via
      • HTTPS (Browser-based access and various Java clients)
      • SMB (i.e. a network file share)
      • sFTP (e.g. FileZilla, CyberDuck)
      • NFS (only with discussion with ResPlat Data Team)
  • Data Movement
  • Encryption (discuss with ResPlat Data Team)
    • HTTPS protocol only : Files can be encrypted at the storage layer (protection against unauthorised access to system back end only). Other protocols will be available in the future.
    • Selected meta-data can be encrypted (protection against unauthorised access to system back end only)
  • Backup
    • A second Mediaflux server runs at the Noble Park data centre. This is known as the Disaster Recovery (DR) server. Its only job is to receive copies of data
    • The DR server is not used as a fail-over - that is, if the primary system fails, we cannot switch over to operations using the DR server.
    • Mediaflux assets (the container of meta-data and data content [e.g. an uploaded file]) are versioned. Whenever your assets change (e.g. modify the meta-data or content) a new version is created.
    • The backup process copies all asset versions to the DR server. When a new asset version is created, that new version is sent to the DR server and attached to the appropriate asset
    • Therefore, there are 2 copies of your data managed by Mediaflux (one on the primary system and one on the DR system).
    • Data that have been destroyed before they are backed up to the DR server cannot be recovered.
    • Data that have been destroyed on the primary server and that have been copied to the DR server are retrievable on request (an administration task).
    • There is no user-controlled process that can delete data on the DR server.
    • Backup processes run frequently. Data are typically copied to the DR server within 24 hours but this is not guarenteed because it depends on the amount of data being handled.
  • High Availability
    • The primary controller (the Mediaflux server that users log in to and interact with) is part of a High Availability pair. If one fails, the service can be moved to the other.
    • Those DB exports are further replicated (copied) to a second Mediaflux server at Noble Park referred to as the DR (Disaster Recovery) server.

Other Relevant Operational Functionality

  • Database Backups
    • The data base(the component that maintains all your meta-data and knowledge about assets (files) on the primary controller server is exported and saved every three hours.
    • These exports are retained for 2 weeks. This means that if the DB should become corrupted, the gap is 3 hours in which data may have arrived which is no longer known (it exists but the system would have no record of it)
    • The DB backups are further synced to the Noble Park DR system (when they are removed from the primary after 2 weeks they are also removed from the DR server)
  • Scalability
    • The primary system consists of a controller node (handling data base transactions) and 2 IO nodes. The IO nodes are used to actually move data to and from the storage. More IO nodes can be added as needed.
      • the IO nodes are only utilised for the HTTPS protocol (SMB coming)
    • The underlying storage is provided via a highly scalable CEPH cluster. More nodes can be added to the cluster as needed.
    • The combination of the scalable Mediaflux cluster and scalable CEPH cluster provides a very extensible environment as our data movement needs grow
data_management/mediaflux/vicnode_mf.txt · Last modified: 2019/09/03 12:06 by Neil Killeen