A rudimentary implementation of a basic document (NoSQL) database in Go
Standalone repository for mongodb's BSON C++ Implementation
dnslink resolution in go-ipfs
[ARCHIVED] [DEPRECATED] logging the development of an interplanetary filesystem
Simple tarball encryption
d3.js-based tool to visualize network communication for arbitrary protocols
package manager for datasets
Software from scratch. An OS with the heart of a database.
started time in 5 days
Btw, tons of great comments overall here, both in original proposal (very well thought out!) and in all the technical comments -- solid discussion so far -- keep it going, let's figure this out together
comment created time in 24 days
Some quick thoughts as well. First some background -- feel free to skip down to the last part ("some thoughts for this issue now").
network improvement principles
- It's important for the filecoin network & community to respond to major world events (natural disasters, policy changes, etc) in a resilient way -- lots of challenges will come that must be overcome together
- Improvements and upgrades to community processes, services, software, and protocol should serve the needs of the whole filecoin network, and all its participants
- Everything needs to be carefully balanced and incentive aligned, all participants should be doing better together. (aim for pareto improvements, and growing utility and value for all wherever possible)
- ("never waste a good crisis") Use problems to motivate solutions that leave things substantially better than they were before.
storage network principles
- Client data must be stored safely at all times (no loss).
- Client data must be available at all times (zero downtime).
- Client requests must be performant at all times (no or little degradation).
- Storage networks must keep many copies of data around the world to be resilient to problems.
- Storage providers must be able to relocate their operations (without it being cost prohibitive).
- Moving data is best done by transferring the bits live over the internet, verifying successful copy. This may not be possible or cost-efficient for too large amounts of data (many PiB or EiB).
- Moving physical drives is tricky. Cheap HDDs break easily. It may be best to transfer in SSDs, or with high replication factors (use erasure coding / RAID schemes).
- Moving hardware is expensive -- sometimes it may be cheaper to buy new hardware and sell old hardware than to move old hardware. This will vary per operation.
- Moving large operations is more resiliently done by parts / streaming (putting all hardware into one single storage container risks complete loss -- best to ship by parts).
- 1. Live migration. The way most computing systems operate in the world is to achieve live migrations, where service to clients is not interrupted (no loss, no downtime, high performance).
- Services that have many copies/endpoints can live migrate by shutting down any copy, and resuming it elsewhere.
- Individual services can be live-migrated by first replicating the service (ie copy and move the data) and then redirect traffic to the new service endpoint (switch). After the service has migrated, the old copies/resources can be reclaimed.
- 2. Suspended migration. Some services cannot (or are to expensive to) migrate live. In these cases, service is suspended gracefully, service for clients degrades for some pre announced time, and then resumes quality of operation.
Specifics in filecoin today
- Filecoin achieves network-level resilience by encouraging many copies across service providers, in different regions -- client data should have many copies and therefore tolerate failure of single storage providers.
DeclareFaultsmechanism already takes into account a grace period for individual storage providers to suspend operations and resume them afterward.
- Migrating deal sectors can be done by moving the sealed sectors, or by retrieving copies of the data and re-sealing it (whichever is cheaper to the storage provider). But it is critical not to lose the data, and not to lose service uptime (can't have all the storage providers storing all the copies shutting down at the same time, or service uptime will be lost. Cant have all storage providers hoping to retrieve the data from each other, or data will be lost).
- Migrating committed capacity sectors can be done by moving the sealed sectors, or re-sealing from scratch (whichever is cheaper to the storage provider). Migrating committed capacity does not need high availability/uptime itself.
- Power in consensus (which committed capacity contributes to, and currently is the larger fraction) must not fluctuate wildly -- this is dangerous from a consensus standpoint. So committed capacity shouldn't shut down and resume in too large amounts (traditional consensus bounds apply). However, the network can indeed tolerate graceful decreases and re-increases in power (we've already seen many large amounts of power be faulted and resume without any problems.
- Incentives should always reward high uptimes and availability of all services, and penalize downtimes.
- Long downtimes are not good for the network, neither in terms of client data or committed capacity or consensus power. 180 days is way, way too long. That should not count as a migration.
Here are some thoughts for this issue now
- Storage providers should endeavor to live migrate deal sectors -- that means copy the data over, and set up a new endpoint before shutting down the old one. This should not be a big problem today, as all the deal data is ~32 PiB, and only a small fraction needs to migrate. The two-week
DeclareFaultsgrace period, including its fees, is already the right incentive structure for client data.
- Committed capacity can undergo suspended migration -- and could be longer than 2 weeks (relocating EiB is not easy), as long as we dont create a problem for consensus power. If we have graceful suspension and resumption of power, and an appropriate incentive structure (this could be either higher collateral or network fee), the network could tolerate longer migrations of capacity (4 wk?), without loss of sectors (and collaterals).
- Suspension of a fraction of CC sectors per actor -- graceful consensus power changes. One way might be to enable suspended (dormant) migration of CC sectors up to some % of the miner's total power at any time (20%?). (The network fee or stake should make it irrational to do this in general, and irrational to reuse storage space / interleave in time to inflate storage power -- this means network fee or TVM has a low bound of the CC sectors reward). This could allow a storage provider to migrate hardware in parts and bring it online elsewhere. This might be hard to do with the software today.
- Suspension of whole actors I think this is hard to make work in general. if they have any deal data, that should not be allowed, or should require some special casing to keep user data available during the period. We would need to measure the amount of suspended power across the network, and rate-limit it (hard limits as a percentage of total QA power, or a dynamic fee structure that increases w/ percentage). This is hard to balance, and creates bad incentive structures for everyone (who gets to go first?) -- i would recommend against this, unless some very simple solution that balances problems emerges. If time is important, i think this will likely take longer to figure out and implement.
- Fastest option to implement: extend fault period from 2wk to 4wk or 6wk. Would need calculating the right fee structure, but it might be easiest/fastest to design and implement.
- Boat freight can take about 15-30 days for shipping boats to travel across an ocean. 4 - 6 weeks might be the right number as max suspended migration time limit. (I'm not sure ocean freight is indeed cheaper than air freight though, all costs considered (including time and lost block rewards)... )
- Business for someone on the Filecoin island economy: buy or rent a jet, paint it filecoin colors, help miners relocate, charge in FIL. (precedent).
comment created time in 24 days
thanks very much! 😄
comment created time in 25 days
- Prior Context. Some recent PRs changed the repo so that automatic websites could be built. that is cool, and a great idea. :+1: (thanks for doing that!) However that changed the layout of the repo to prioritize tooling and broke links.
- Prioritize humans This repo is for humans, and linking in various community forums. Lots of important deep links into the FIPs have broken, and need to be restored. Changing the layout of this repo to prioritize tooling is not good, and its layout should revert back.
- Tooling in branches or subdirs: You can automatically generate things and so on, but please move that into other branches (auto builds) or subdirectories (build bots can look into subdirs easier than humans + rewriting links everywhere on the internet).
- Git History. Also, please bring back the git history for all FIPs, that is pretty important community information. If the history is lost on github, I'm sure some of us have repo copies and we can likely find it.
created time in a month